1. Introduction
The Boolean satisfiability problem (SAT), often referred to as the propositional satisfiability problem, represents a fundamental challenge in computer science. This problem involves determining whether there exists an assignment of truth values to the variables of a Boolean formula that makes the formula evaluate to true. If such an assignment exists, the formula is classified as satisfiable (SAT); otherwise, it is labeled as unsatisfiable (UNSAT) [
1]. SAT is recognized as a critical computational problem with wide-ranging applications, spanning areas such as decision-making [
2], combinatorial optimization [
3], and foundational research in theoretical computer science [
4].
SAT was the first problem proven to be NP-complete by Cook in 1971 [
5], meaning that any problem in the NP class can be transformed into SAT in polynomial time. This foundational result underscores the significance of SAT as a benchmark for evaluating the complexity of computational problems. In practice, SAT has broad applications in fields such as planning [
6], verification [
7], and scheduling [
8]. Furthermore, numerous intricate problems, such as the traveling salesman problem and clique detection [
9], can be reformulated as SAT instances, allowing them to be efficiently tackled with advanced SAT-solving algorithms.
1.1. Traditional SAT Solvers
Traditional SAT solvers, such as the Davis–Putnam–Logemann–Loveland (DPLL) algorithm [
10] and the conflict-driven clause learning (CDCL) algorithm [
11], have played a foundational role in solving Boolean satisfiability problems. The DPLL algorithm is one of the earliest systematic methods developed to solve SAT problems. It incorporates recursive backtracking techniques and heuristics strategies to efficiently determine variable assignments. The CDCL algorithm builds upon DPLL by incorporating conflict-driven clause learning and non-chronological backtracking, which efficiently prune the search space by learning new clauses from conflicts and backtracking to the root cause of the conflict. These improvements have made CDCL the dominant algorithm in modern SAT solvers, excelling at handling large-scale and complex instances.
A defining characteristic of CDCL is its conflict-driven learning mechanism, which enables efficient search space reduction by leveraging conflict information encountered during the solving process [
12]. Additional improvements include advanced heuristics such as the Variable State Independent Decaying Sum (VSIDS), which dynamically prioritizes variables based on their involvement in conflicts, helping guide the solver toward the most promising search paths [
13]. These continuous innovations underscore the evolving nature of SAT-solving techniques, significantly boosting the efficiency and scalability of modern solvers.
1.2. Learning-Based Approaches to SAT
Recent progress in machine learning, particularly in the field of deep learning, has shown promising potential for addressing SAT-solving challenges [
14]. While neural network-based approaches, such as NeuroSAT [
15], have achieved significant advancements in satisfiability prediction, traditional approaches like CDCL continue to play a pivotal role in modern SAT solvers. Learning-based approaches for SAT-solving can generally be divided into two categories: traditional machine learning techniques, which assist in selecting algorithms or tuning heuristics for SAT solvers, and deep learning models, which directly predict the satisfiability of SAT formulas through end-to-end learning.
End-to-End Model Approaches: Recent advancements in SAT-solving have leveraged deep learning techniques to tackle the Boolean satisfiability problem. NeuroSAT, introduced in Learning an SAT Solver from Single-Bit Supervision, was the first model to utilize GNNs for SAT classification. By modeling SAT formulas as literal-clause graphs and employing MPNNs, NeuroSAT demonstrates the potential of GNNs in capturing structural relationships within SAT formulas. However, its scalability declines notably on larger and more complex problem instances. Building on this foundation, SATformer, as presented in SATformer: Transformers for SAT-Solving, explores Transformer-based architectures for SAT-solving. By integrating positional encoding and leveraging attention mechanisms, SATformer effectively captures long-range dependencies within SAT formulas, achieving promising results on complex instances. Nevertheless, the high computational cost of Transformer-based approaches poses challenges for scaling to larger problems or resource-constrained environments [
16]. Other works have also advanced SAT-solving with deep learning. Query-SAT introduced a query mechanism and unsupervised loss functions, enabling iterative refinement of SAT-solving strategies by reducing the search space and enhancing reasoning efficiency [
17]. Similarly, a neural network framework using Directed Acyclic Graph (DAG) embeddings captures the inherent topological structures of circuits, improving generalization and predictive accuracy in Circuit-SAT problems [
18]. DeepSAT, an Electronic Design Automation (EDA)-driven framework, applies deep learning to enhance SAT-solving strategies in EDA workflows, such as logic synthesis and formal verification [
19]. Likewise, GAT-SAT employs graph attention networks (GATs) to improve the accuracy of propositional satisfiability prediction by dynamically capturing inter-node dependencies, serving as a benchmark for comparison in this study [
20]. Permutation-invariant architectures also demonstrate superior performance over traditional methods in SAT satisfiability prediction by ensuring input symmetry and enhancing generalization capabilities [
21]. Finally, early studies on the potential of GNNs for Boolean satisfiability introduced foundational frameworks, demonstrating the feasibility of graph-based approaches and inspiring further advancements. These studies provided insights into the strengths and limitations of GNNs in SAT-solving.
Learning-aided Approaches: These methods improve traditional SAT solvers by leveraging neural network predictions to develop heuristics, significantly enhancing the performance of CDCL solvers and local search (LS) algorithms. For instance, NeuroCore predicts unsatisfiable cores, enabling periodic adjustments to variable activity scores in CDCL solvers to prioritize challenging instances [
22]. Graph-Q-SAT leverages reinforcement learning and GNNs to optimize branching, reducing the number of iterations for solving instances [
23]. Glue variable prediction refines branching heuristics by focusing on low-decision-level variables [
24], while the GVE algorithm integrates glue variable elimination with graph learning to simplify formulas and boost efficiency [
25]. Learning cubing heuristics from DRAT proofs improves variable selection in the cube-and-conquer framework [
26], and neural heuristics have been shown to enhance branching in DPLL and CDCL algorithms compared to traditional methods [
27]. Techniques like NeuroComb utilize GNNs to predict key variables, incorporating them into dynamic branching strategies for superior performance [
28]. Additionally, learned heuristics enhance local search methods by refining diversification and intensification strategies, enabling solvers to address more complex SAT instances [
29,
30]. These approaches expand the adaptability of traditional solvers, equipping them to tackle a broader range of SAT challenges.
Building on these early advancements, recent research has explored novel ways to leverage graph-based approaches for SAT-solving. Among these, SAT-GATv2 introduces a graph-based modeling approach that transforms SAT formulas into literal-clause graphs. This approach effectively models intricate structural dependencies and generalizes well across diverse SAT problem distributions, especially those with complex structures. Unlike NeuroSAT, which struggles with scalability on larger instances, SAT-GATv2 employs a modular architecture that balances computational efficiency and representational power. This design also enhances adaptability, making it suitable for broader combinatorial optimization tasks like graph coloring and maximum satisfiability. The architectural details and performance evaluations of SAT-GATv2 are discussed in the following sections.
1.3. Contributions
In this paper, we present SAT-GATv2, a neural network framework specifically designed for solving the SAT problem. SAT-GATv2 integrates MPNNs and GATv2 within a novel architecture to effectively capture both local and global dependencies in SAT formulas. By addressing critical challenges such as scalability to large SAT instances and computational efficiency, the model enables robust handling of complex SAT instances. With iterative message-passing and attention-based updates, SAT-GATv2 achieves strong performance on benchmark datasets and demonstrates adaptability to diverse SAT-solving scenarios.
Figure 1 illustrates the SAT-GATv2 model’s training workflow, designed as a classifier using a dataset of input–output pairs. Each input consists of SAT problem P, and the corresponding output is a binary label indicating whether P is satisfiable. During training, the model learns patterns to determine satisfiability from the provided data. In the testing phase, SAT-GATv2 leverages the knowledge gained during training to evaluate new SAT problems and predict their satisfiability.
Overall, this paper’s contributions can be summarized as follows:
Introduction of SAT-GATv2: We propose SAT-GATv2, an end-to-end neural network model designed for solving the SAT problem. By integrating MPNNs and GATv2, the model simultaneously enhances node feature representations and models crucial inter-node relationships. This design achieves significant improvements in prediction accuracy and computational efficiency, making it particularly suitable for large-scale and complex SAT instances;
Enhanced Problem-Solving Capabilities: SAT-GATv2 transforms SAT problems into literal-clause graphs in Conjunctive Normal Form (CNF) and iteratively refines node features through message passing. The dynamic attention mechanism adaptively prioritizes critical features by weighting significant dependencies, enabling efficient handling of SAT instances with varying complexities;
Performance Gains: Experimental results demonstrate that SAT-GATv2 outperforms baseline models, achieving 1.75–5.51% higher accuracy and improved efficiency on random 3-SAT(n) and SR(n) instances. Ablation studies underscore the complementary contributions of MPNNs and GATv2, highlighting their pivotal roles in the model’s robustness and scalability.
The structure of this paper is as follows:
Section 2 introduces SAT problem representation and terminology in CNF, along with a review of related work.
Section 3 details the SAT-GATv2 model, focusing on its design and implementation.
Section 4 describes the experimental setup and results, highlighting the model’s performance. Finally,
Section 5 concludes the paper and suggests potential future research directions.
2. Background and Related Work
This section introduces the SAT problem and its representation in CNF, highlighting fundamental invariance properties critical to SAT-solving. Furthermore, it examines diverse graph-based approaches to representing SAT problems, with a specific focus on the inherent structure of 3-SAT problems. Finally, it discusses the application of GNNs as a promising solution to these challenges.
2.1. SAT Problem and CNF Formula
The SAT problem is a fundamental challenge in computer science and is commonly expressed in CNF. A CNF formula consists of Boolean variables xi, constants true (1) and false (0), and logical operators, including negations (¬), conjunctions (∧), and disjunctions (∨). A formula is satisfiable if an assignment of Boolean values to its variables makes the formula evaluate to true. In CNF, the components are defined as follows:
Literal: A literal is defined as either a Boolean variable or its negation. For instance, xi and ¬xi qualify as literals;
Clause: A clause represents a disjunction of literals. For instance, (x1 ∨ x2) is an example of a clause;
CNF Formula: A CNF formula consists of a conjunction of clauses. For instance, it can be expressed as follows: Φ = (x1 ∨ x2) ∧ (¬x2 ∨ x3) ∧ (¬x1 ∨ x2 ∨ ¬x3).
An example of a satisfying assignment for this formula is x
1 = 0, x
2 = 1, and x
3 = 1, which makes the formula evaluate to true (1).
Table 1 illustrates the structure of a CNF formula, providing a clear overview of its components, including clauses, variables, and literals. In this example, the CNF formula consists of variables x
1, x
2 x
3, where each variable represents a Boolean value that can either be true (1) or false (0). The literals {x
1, ¬x
1, x
2, ¬x
2, x
3, ¬x
3} represent the positive and negated forms of the variables, respectively. This example aligns with the components detailed in
Table 1, ensuring consistency and clarity.
2.2. Invariance Properties in SAT
In propositional logic, two key invariance properties—Permutation Invariance and Negation Invariance—are essential for understanding and solving SAT problems.
Permutation Invariance: This property states that the satisfiability of a Boolean formula remains unchanged when certain components of the formula are reordered. Specifically, as follows:
Permuting literals within a clause: The arrangement of literals in a single clause can be modified without impacting satisfiability. A clause is considered satisfied if at least one of its literals evaluates to true, and changing the order of literals does not affect this condition;
Permuting the sequence of clauses: The sequence of clauses in the formula can be freely rearranged since the formula’s satisfiability depends on the satisfiability of individual clauses rather than their sequence;
Permuting identical literals across clauses: Identical literals (e.g., x and x, or ¬x and ¬x) across different clauses can be swapped, provided their logical polarity (positive or negative) is preserved within each clause. This operation does not affect satisfiability since the identical literals represent the same truth value.
This property states that the satisfiability of a formula remains unchanged when the ordering or naming of its components (literals and clauses) is altered as long as the logical relationships between them are preserved. In essence, Permutation Invariance ensures that the formula’s satisfiability is independent of the specific arrangement of its components. This property enables symmetry and flexibility in SAT-solving, allowing solvers to efficiently handle various permutations of the same formula and generalize across different problem formulations.
Negation Invariance: Negation Invariance refers to the property that the satisfiability of a formula remains unchanged when the polarity of a variable is flipped across all its occurrences. Specifically, flipping the polarity of a variable (i.e., changing x to ¬x, or vice versa) does not alter the satisfiability of the formula, as the logical relationships between variables remain intact. For example, flipping all instances of a variable x to ¬x (or vice versa) results in an equivalent formula with the same satisfiability because the overall logical structure remains unchanged. This property ensures that the formula’s satisfiability is unaffected by whether a variable is expressed in its positive or negated form.
2.3. SAT Representation and Formula Graphs
Boolean formulas are often expressed in CNF, where the formula is a conjunction of disjunctions. In this study, we represent Boolean formulas using CNF [
31], visualized as a bipartite graph with two types of nodes: literals and clauses. This graphical representation provides a better understanding of the structural relationships within SAT instances [
32]. SAT instances can be represented in four primary ways.
Figure 2 presents example diagrams for each representation, highlighting their characteristics and practical applications. The CNF formula used in these examples is (x
1 ∨ x
2) ∧ (¬x
2 ∨ x
3) ∧ (¬x
1 ∨ x
2 ∨ ¬x
3).
Literal-Clause Graph (LCG): In this graph, both literals and clauses are represented as nodes, with edges indicating which literals are part of which clauses. LCGs provide a clear visualization of the relationships between literals and their associated clauses;
Variable-Clause Graph (VCG): In VCGs, nodes represent variables and clauses, and edges depict the involvement of variables in specific clauses. This representation helps illustrate how variables are distributed across clauses and their overall role within the formula;
Literal Interaction Graph (LIG): LIGs focus on the interactions between literals that appear in the same clause. This representation omits clause nodes, highlighting the interactions and potential conflicts between literals;
Variable Interaction Graph (VIG): VIGs visualize the dependencies between variables throughout the formula. This representation is useful for understanding how the relationships between variables influence the satisfiability of the formula.
Figure 3 illustrates a Literal-Clause Graph (LCG) alongside its corresponding adjacency matrix for the CNF formula (x
1 ∨ x
2) ∧ (¬x
2 ∨ x
3) ∧ (¬x
1 ∨ x
2 ∨ ¬x
3). On the left, the LCG visualizes the nodes representing literals and clauses, with edges indicating the inclusion of literals within specific clauses. This representation emphasizes the structural relationships between literals and clauses. On the right, the adjacency matrix offers a numerical depiction of these relationships, where “1” denotes a connection between literals and clauses, and “0” represents no connection. This dual representation facilitates a deeper understanding of the formula’s structure and aids in the analysis of SAT problems.
Figure 4 presents an alternative matrix representation of a CNF formula. The formula, denoted as “p CNF 5 6,” consists of five variables and six clauses. On the left side, the clauses are listed along with their literals, where positive numbers represent variables and negative numbers indicate the negation of those variables. On the right, the matrix form is shown, with each row corresponding to a clause and each column representing a variable [
33]. In this matrix, a “1” signifies the presence of a positive literal, “−1” indicates a negative literal, and “0” represents the absence of a variable in the clause. While this matrix representation is not directly used in the current paper, it offers a clear visualization of the structure of CNF formulas. The CNF formula corresponding to this matrix is as follows: (x
1 ∨ x
2 ∨ x
3) ∧ (x
2 ∨ ¬x
3 ∨ x
4) ∧ (x
1 ∨ ¬x
2 ∨ ¬x
3) ∧ (x
1 ∨ x
2 ∨ x
4) ∧ (x
3 ∨ ¬x
4 ∨ ¬x
5) ∧ (¬x
3 ∨ x
4 ∨ x
5).
2.4. 3-SAT Problems and Uniform Random 3-SAT
The 3-SAT problem is a specific variant of SAT, where each clause is limited to exactly three literals. As an NP-complete problem, 3-SAT is a fundamental case in computational complexity theory. A critical factor in determining the difficulty of a 3-SAT instance is the “clause-to-variable ratio,” which indicates the proportion of clauses to variables [
34]. Notably, around the critical ratio of 4.26, 3-SAT problems undergo a phase transition. At this threshold, the probability of satisfiability decreases sharply, transitioning from near certainty to near impossibility. Instances at this critical ratio are particularly challenging for SAT solvers due to the sudden increase in difficulty.
Uniform Random 3-SAT refers to a distribution of SAT instances where 3-CNF formulas are generated randomly. For a formula with n variables and m clauses, each clause is constructed by selecting three literals randomly from a set of 2n literals (variables and their negations). The process excludes clauses that contain duplicate literals or tautologies (where a variable and its negation appear together). This method provides a broad and systematic testing framework for SAT solvers, covering a wide spectrum of problem difficulties. The distribution of random 3-SAT instances is parameterized by n and m, with Uniform Random 3-SAT encompassing all valid combinations of these parameters. This approach offers a comprehensive benchmark for assessing the performance and robustness of SAT-solving algorithms.
2.5. Graph Neural Networks (GNNs)
GNNs are an advanced extension of traditional deep neural networks designed specifically for processing graph-structured data. Often called “deep learning on graphs” or “geometric deep learning”, GNNs have emerged as one of the fastest-growing fields in machine learning research [
35]. A key distinction of GNNs is their capability to handle complex, non-Euclidean data structures, enabling them to excel across a wide range of challenging problems. This distinctive ability has attracted considerable attention from both academia and industry.
The foundational concept of GNNs involves iteratively refining a node’s representation by integrating information from its neighbors, along with its initial features. As described in [
36], this process starts with an initial node representation H
0 = X. At each layer, the GNN framework applies two key functions to update the node representations:
The general framework of a GNN is mathematically defined as follows [
38], with the corresponding equations provided in Equations (1) and (2).
Initial representation: H0 = X;
Among them,
N(
v) denotes the aggregation of neighboring nodes of the node
v. The node representation
Hk obtained from the final layer is considered the final representation for that node [
39]. Once these representations are computed, they can be applied to various downstream tasks. For instance, in node classification, the label of node V (denoted as
v) is predicted using the Softmax function, as shown in Equation (3).
, and
indicates the total number of output labels. During model training, a set of labeled nodes is used, with the loss function minimized as defined in Equation (4).
In this equation, yi represents the assigned label for node i, nl denotes the total number of labeled nodes, and loss(.,.) specifies the loss function, such as cross-entropy loss. The neural network is trained by minimizing the objective function O through backpropagation.
Figure 5 illustrates a basic layer of a GNN model, emphasizing the iterative process of updating node, edge, and global features to refine the graph representation [
40]. Notably, while the model updates edge weights
En, it does not explicitly use edge embeddings. Instead, it focuses on iteratively refining node and global features. This “graph-in, graph-out” architecture preserves the essential structural relationships of the input graph while enhancing the embedded information within it. As a result, the representations of nodes, edges, and global features are progressively refined across layers [
41].
3. SAT-GATv2
The SAT problem is a core NP-complete challenge with wide applications in optimization, AI, hardware verification, and more. Despite considerable progress in SAT-solving techniques, both traditional methods, such as the DPLL algorithm, and machine learning-based models, like NeuroSAT, still face significant challenges when addressing large-scale and complex SAT instances. Traditional solvers often struggle with computational inefficiency, while machine learning models may fail to fully capture the intricate dependencies among variables in highly complex SAT problems.
To address these limitations, we propose SAT-GATv2, an end-to-end neural network model that combines MPNNs with dynamic attention mechanisms from GATv2 within a GNN framework. By converting SAT problems into graph representations, SAT-GATv2 effectively captures variable dependencies. The dynamic attention mechanism of GATv2 enables the model to emphasize relevant features, adapting to complexities across SAT instances. This integration enhances both prediction accuracy and computational efficiency, making SAT-GATv2 particularly effective for solving large and complex SAT instances that present challenges for traditional solvers.
The architecture of SAT-GATv2, illustrated in
Figure 6, consists of several key components essential for solving SAT problems. These components leverage graph-based techniques, specifically MPNNs and GATv2, to significantly enhance both prediction accuracy and computational efficiency. The model is structured as follows:
(1) Graph Encoding: The Boolean formula is initially encoded into a literal-clause graph (LCG), represented as an adjacency matrix. This graph-based representation transforms the SAT problem into a structured format, allowing the relationships between literals and clauses to be effectively captured;
(2) Initial Feature Vector Creation: After constructing the graph, initial feature vectors are generated for each literal and clause node via a linear transformation layer. These vectors serve as the foundational input for further processing, establishing a baseline representation that is refined through subsequent layers;
(3) Feature Space Transformation: The initial feature vectors pass through a multi-layer perceptron (MLP), projecting them into a higher-dimensional space. This process enhances the model’s capability to capture complex relationships and dependencies in the data. The deep MLP facilitates the learning of non-linear mappings, essential for modeling the intricate structures in SAT problems;
(4) Message-Passing Neural Networks (MPNNs): MPNNs are employed to iteratively update the feature representations of each node by aggregating information from neighboring nodes. This message-passing process enables the model to capture contextual dependencies and progressively refine the node features for more accurate predictions. However, MPNNs often struggle to capture long-range dependencies, which are critical for solving complex SAT problems. To address this limitation, SAT-GATv2 uniquely combines MPNNs with dynamic attention mechanisms (GATv2), providing a complementary balance between local feature aggregation and global feature prioritization;
(5) Graph Attention Network (GATv2): The integration of MPNNs and GATv2 represents a key innovation of SAT-GATv2. While MPNNs effectively aggregate local features, GATv2 dynamically computes attention scores during training to emphasize critical nodes and edges in the SAT graph. This hybrid approach ensures that SAT-GATv2 captures both local dependencies and global structures, achieving a robust and comprehensive representation of SAT graph structures. After message passing, the updated node features are processed through the GATv2 network. The attention mechanism in GATv2 calculates attention weights between the literal and clause nodes, enabling the model to selectively focus on the most relevant nodes during the learning process. Unlike static attention mechanisms, as used in prior models such as GAT-SAT, GATv2 dynamically adjusts attention weights based on evolving graph structures. This adaptability enables SAT-GATv2 to handle complex SAT instances more effectively, particularly those with intricate dependencies among variables [
42];
(6) Classification: Finally, the feature representations of the literal nodes are passed through an MLP layer for classification. An activation function, such as Softmax, is applied to compute a “vote” for each literal. If the average vote exceeds a threshold (e.g., 0.5), the formula is deemed satisfiable; otherwise, it is unsatisfiable.
The modular design of SAT-GATv2 offers flexibility in adjusting its computational complexity, ensuring scalability to larger problem instances. With the MPNN module, the number of message-passing layers (L) can be reduced to minimal computational overhead while maintaining only a minimal impact on model performance. Likewise, the GATv2 module facilitates further optimization by modifying the number of attention heads (h) and feature dimensions (d), both of which directly influence runtime and memory consumption. These configurable parameters enable SAT-GATv2 to strike an effective balance between computational efficiency and model accuracy, allowing it to adapt seamlessly to a wide range of problem sizes and resource constraints. This adaptability underscores the versatility of SAT-GATv2 in addressing diverse challenges within the domain of SAT problem-solving.
3.1. Message-Passing Neural Networks (MPNNs)
In the SAT-GATv2 model, message-passing neural networks (MPNNs) operate in two distinct stages during each iteration [
43]. As illustrated in
Figure 7, the message-passing process involves updates flowing from literals to clauses (a) and then back from clauses to literals (b).
Message Passing from Literals to Clauses (a): In the first stage (a), literal nodes send their corresponding information to the clauses they are associated with. Each literal, represented by its embedding, transmits messages that are aggregated by the clauses, combining the information from all literals within the clause. This process is formally defined in Equation (5).
Message Passing from Clauses to Literals (b): In the second stage (b), the aggregated information from the clauses is passed back to the literals. This feedback allows the literals to update their embeddings by incorporating the new information from the clauses while retaining their previous state. This iterative process, described in Equation (6), improves the feature representation of the literals by integrating information from neighboring nodes at multiple levels.
The SAT-GATv2 model is designed with specific learning parameters and architectural components aimed at optimizing its performance. The model utilizes two initialization vectors, Linit and Cinit, to initialize the embeddings of literals and clauses, respectively. It also incorporates three multi-layer perceptrons (MLPs), Lmsg, Cmsg, and Lvote, as well as two layer-normalized LSTM units, Lupdate and Cupdate. During the initialization process, a matrix of size 2N × d is created to store the vector embeddings, with each row corresponding to one of the vectors. All rows are initialized using Linit for literals, while a similar initialization process is applied to clauses with Cinit. Through T iterations of message passing, the model progressively enhances the feature representations of nodes by integrating multi-order neighbor information. These updated node feature vectors are then processed by the GATv2 network, which employs multi-attention mechanisms to further refine both the literal and clause embeddings. This comprehensive approach allows SAT-GATv2 to capture the intricate dependencies within the SAT problem, leading to significant improvements in both predictive accuracy and robustness when solving complex instances.
3.2. GATv2
The feature representations of nodes in the message-passing network, which integrates both GNN and LSTM architectures, capture information from multi-order neighboring nodes through multiple iterations. The GATv2 network further refines these representations using a dynamic graph attention mechanism, allowing the model to emphasize critical relationships between nodes. In GATv2, each node’s features are updated based on the features of its neighbors and their interrelationships, with attention weights determining the influence of each neighboring node. This process results in refined feature representations that emphasize critical nodes and relationships [
44], as shown in
Figure 8.
Furthermore, GATv2 employs a multi-head attention mechanism, where multiple parallel attention heads capture diverse aspects of node relationships simultaneously. This approach enables the model to aggregate information from various perspectives, thereby yielding a more comprehensive and nuanced representation of the graph structure. The outputs from these attention heads are then combined, either by concatenation or averaging, to produce the final updated node features.
More specifically, given a node i with its feature representation hi, the feature is updated in GATv2 through the following steps:
(1) Attention Weight Calculation: Compute the raw attention scores
e(
hi,
hj) between node
i and its neighboring nodes
j using a learnable parameterized function. This is achieved by concatenating the feature representations of
i and
j, transforming them via a shared weight matrix
W, and applying a LeakReLU activation:
as shown in Equation (7). Alternatively, dynamic attention weights can be computed using Equation (8) [
45], where the weights are learned independently for different layers or tasks;
(2) Normalization: Normalize the computed attention scores using the Softmax function to obtain attention coefficients
αij, which represent the relative importance of neighboring node j to node i:
as detailed in Equation (9). This ensures that the attention coefficients across all neighbors sum to 1, allowing for a probabilistic interpretation;
(3) Weighted Summation: Use the normalized attention coefficients
αij to compute a weighted sum of the neighboring nodes’ feature representations. The updated feature representation for node
i is calculated as follows:
as shown in Equation (10). Here, σ is an activation function (e.g., ReLU), which introduces non-linearity to enhance feature expressiveness;
(4) Multi-Head Attention: Further refine the feature representation using multi-head attention. Multiple attention heads learn diverse attention coefficients in parallel, and their outputs are concatenated to form the final feature representation:
as described in Equation (11). This mechanism enables the model to capture different aspects of the graph structure, improving both robustness and performance.
These steps enable GATv2 to effectively utilize information from neighboring nodes, improving feature representations and enhancing the model’s ability to capture relationships within the graph structure. By integrating multi-head attention, the model ensures comprehensive representation learning, which is critical for tasks like SAT problem-solving.
In these formulas, W denotes the weight matrix, α represents the attention parameter vector, and ‖ indicates vector concatenation, N
i refers to the set of neighboring nodes for node
i, while σ refers to the activation function.
αij denotes the attention coefficient between nodes
i and
j. The multi-head attention mechanism enables the network to learn multiple sets of attention coefficients in parallel through attention heads. These coefficients are then aggregated to form the final node representation
, where k is the number of attention heads, and ‖ signifies the concatenation of the resulting vectors [
46].
3.3. Prediction
After T iterations of message passing in the SAT-GATv2 model, the feature representations of each node are processed through a dynamic graph attention network (GATv2). This mechanism incorporates two attention strategies: one computes attention scores between literal nodes and their adjacent clause nodes, while the other focuses on clause nodes and their neighboring literal nodes. These attention scores are used to generate weighted feature representations for the final literal and clause nodes. Prediction is made using a sigmoid activation function, as defined in Equations (12)–(14).
The model employs a cross-entropy loss for binary classification to predict the satisfiability of a given Boolean formula. This loss function reduces the logarithmic difference between predicted probabilities and true labels, enhancing the model’s accuracy. The formulation is defined in Equation (15).
Here, N is the total number of samples, yi denotes the true label for sample i (0 or 1), and pi is the predicted probability for sample i.
Figure 9 presents a heatmap illustrating the phase transition in the voting behavior of the SAT-GATv2 model for a problem instance with 20 variables (SR(20)). The matrix organizes variables along the rows and model iterations along the columns. Over 30 message-passing iterations, the heatmap captures the evolution of voting confidence, transitioning from light blue (indicating low confidence) to deep red (indicating high confidence). This progression highlights the model’s ability to dynamically refine its predictions and improve accuracy through iterative message passing. The observed color shift demonstrates the effectiveness of the dynamic attention mechanism and message-passing framework in enhancing the model’s reliability and scalability, particularly for capturing complex relationships within SAT problems.
4. Experiments
This section outlines the experimental setup, including the configurations, datasets, evaluation metrics, and results. The experiments evaluate the SAT-GATv2 model’s performance in solving SAT problems. Specifically, random k-SAT(n) instances and the SR(n) dataset are used to assess the model’s effectiveness and robustness.
4.1. Experimental Settings
The experimental datasets consist of random k-SAT(n) instances and small-scale SR(n) problem instances, selected based on benchmarks from prior studies such as NeuroSAT and GAT-SAT. These datasets capture diverse SAT problem structures: random k-SAT instances represent general satisfiability scenarios, while SR(n) instances introduce structured challenges for evaluation. SAT-GATv2 was trained and tested on these datasets, with a focus on predicting satisfiability outcomes. Although these datasets provide meaningful insights into the model’s capabilities, they do not encompass the full diversity of SAT problem complexities. To address this limitation, future work will extend the evaluation to include phase transition problems, known for their critical complexity thresholds, and real-world SAT instances to better assess the model’s scalability and practical applicability.
Random k-SAT: To assess the SAT-GATv2 model’s generalization across various problem sizes and complexities, random k-SAT instances were generated, focusing on the 3-SAT variant where each clause contains exactly three literals. The number of variables ranged from 30 to 300, with the clause-to-literal ratio fixed at 3, ensuring consistent problem density. Recognized for its computational complexity, 3-SAT is recognized for its computational complexity, making it an ideal benchmark for evaluating model performance [
47]. For each variable count
n, 10,000 random instances were generated and split into training, validation, and test sets in an 8:1:1 ratio. The experiments were conducted at the satisfiability phase transition, which occurs around a clause-to-variable ratio of 4.26, where the probability of satisfiability is approximately 50%. This phase transition is considered the most challenging region for 3-SAT problems, serving as a rigorous test for evaluating the model’s robustness and generalization capability.
SR(n): To further assess the model’s performance, the SR(n) dataset, consisting of structured SAT problems derived from real-world applications, was utilized due to its challenging nature. The focus was on SR(3–10) and SR(10–40) subsets containing problem instances with variable counts ranging from 3 to 10 and 10 to 40, respectively. The dataset, comprising 10,000 instances, was split into training, validation, and test sets in an 8:1:1 ratio.
4.2. Evaluation Metrics
The model’s performance was evaluated as a binary classification task, using a confusion matrix to categorize predictions into four categories: True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN). This approach allows for the calculation of essential performance metrics, such as accuracy, precision, recall, and the F1 score.
Figure 10 presents the confusion matrix, which highlights the relationship between true labels and predicted labels for both SAT and UNSAT classifications.
The evaluation metrics employed in this study are defined as follows:
Accuracy: This metric calculates the proportion of correctly classified instances (both SAT and UNSAT) relative to the total number of instances, as shown in Equation (16).
Precision: This metric represents the proportion of true positive predictions among all instances classified as positive, as defined in Equation (17).
Recall: This metric measures the proportion of true positive predictions out of all actual positive instances, as defined in Equation (18).
F1-Score: The F1-Score is the harmonic mean of Precision and Recall, providing a balanced evaluation of these two metrics, as shown in Equation (19).
These evaluation metrics provided a thorough assessment of the model’s performance during both the training and testing phases.
4.3. Experimental Results
The performance of SAT-GATv2 was evaluated against advanced end-to-end neural network models, including NeuroSAT and GAT-SAT, across multiple datasets. The results consistently demonstrate that SAT-GATv2 outperforms its counterparts, achieving superior accuracy, especially in more complex and challenging instances.
Random 3-SAT Experimental Results:
Table 2 summarizes the accuracy of various models on random 3-SAT problems, showing that SAT-GATv2 achieves improved performance compared to other models [
48].
Figure 11 provides a visual comparison of the accuracy of three models—NeuroSAT, GAT-SAT, and SAT-GATv2—on random 3-SAT problems with varying problem sizes (30, 50, 70, 100, 150, 200, 250, 300). The results consistently indicate that SAT-GATv2 achieves higher accuracy across all problem scales, demonstrating its robustness and effectiveness.
SR(n) Experimental Results:
Table 3 presents the performance of different models on SR(n) problems. SAT-GATv2 consistently outperforms the other models across both SR(3–10) and SR(10–40) datasets, highlighting its superior generalization ability across varying levels of problem complexity.
Figure 12 provides a comparative analysis of the performance of SAT-GATv2, NeuroSAT, and GAT-SAT across various datasets, using accuracy, precision, recall, and F1 score as evaluation metrics. SAT-GATv2 outperforms the other models in all metrics, showing particularly strong performance on challenging SR(n) instances. This comparison underscores the robustness and effectiveness of SAT-GATv2’s dynamic attention mechanism in significantly enhancing model performance.
5-SAT Experimental Results: As presented in
Table 4, the accuracy of all models declined on the 5-SAT problems due to the increased complexity compared to 3-SAT. However, SAT-GATv2 consistently outperformed both NeuroSAT and GAT-SAT, achieving the highest accuracy of 0.780 on the 5-SAT problem with 30 variables, and maintained its superiority as the problem size increased.
4.4. Ablation Experiment
To evaluate the contributions of the MPNN and GATv2 modules in the SAT-GATv2 model, ablation experiments were performed on three datasets: SR(3–10), 3-SAT(100), and 3-SAT(200). Three configurations were tested, and the results are presented in
Table 5. The complete SAT-GATv2 model consistently achieved the highest accuracy across all datasets. To ensure fairness and reliability, all parameter configurations were kept consistent across all experiments.
This configuration uses only the MPNN module, excluding the GATv2 component. It isolates the performance of the MPNN module to evaluate its effectiveness when operating independently with increased iterations.
In this experiment, only the GATv2 module is used, with the MPNN module excluded. This setup assesses how well GATv2 performs on its own while maintaining the same number of iterations as in Group 1.
This configuration alternates between the MPNN and GATv2 modules, applying each sequentially: first, the MPNN layer, followed by the GATv2 layer. The total number of iterations in this configuration is twice that of the single-module iterations in Groups 1 and 2, as each module is applied alternately. This alternating structure enables the model to capture both global structural patterns from MPNN and fine-grained relationships from GATv2, assessing their combined impact on performance.
The results demonstrate that the MPNN module effectively aggregates information from multi-hop neighbors, capturing global structural patterns by treating all neighboring nodes equally. This ability is especially valuable in SAT problems, where understanding the distribution of literals across clauses is crucial for satisfiability prediction. In contrast, the GATv2 module refines node features using adaptive attention mechanisms, focusing on nodes that play a more critical role in satisfiability due to their positions in the graph [
49]. The full SAT-GATv2 model, which integrates both MPNN and GATv2, consistently achieves the highest accuracy across all datasets. This highlights the synergistic effort of combining MPNN and GATv2, enabling the model to capture both global and fine-grained dependencies, thereby delivering superior performance.
To ensure fairness, the total iteration count was kept consistent across all configurations, ensuring that the performance of each module could be independently evaluated while controlling for computational cost. Specifically, Group 1 (MPNN only) and Group 2 (GATv2 only) allocated all iterations to their respective modules, enabling each to fully process the data and maximize its individual learning capacity. In contrast, Group 3 (alternating MPNN and GATv2) divided the iterations equally between the two modules, effectively allowing both to contribute within the same computational budget as the other groups. Although the full SAT-GATv2 model integrates both MPNN and GATv2, which inherently increases the parameter count per iteration, its performance improvements are primarily attributed to the synergy between the two modules. By combining MPNN’s global aggregation with GATv2’s localized attention mechanisms, the model effectively captures both structural patterns and fine-grained relationships within SAT instances.
This experimental design aligns with SAT-GATv2’s objectives, demonstrating its ability to maintain computational consistency across configurations while leveraging the strengths of each module to handle diverse and complex SAT problems effectively.
4.5. Experimental Environment and Parameter Settings
Environment Configuration: The experiments were conducted on a system equipped with an Intel(R) Xeon(R) Platinum 8260 CPU @2.30GHz and an NVIDIA GeForce RTX 3090 GPU. The equipment was provided by Intel Corporation and NVIDIA Corporation, both located in Santa Clara, CA, USA. The PyTorch deep learning framework, version 1.12.1, was used for model training and evaluation. Data generation using Minisat, version 2.0, was handled by the CPU, while the GPU was employed for efficient parallel processing, optimizing computational performance and reducing training time.
Hyperparameter Configuration: The model utilized 128-dimensional feature vectors for literal nodes, clause nodes, and hidden units, consistent with NeuroSAT’s setup, to enable meaningful feature representation. Each multi-layer perceptron (MLP) component (Lmsg, Cmsg, Lvote) comprised three hidden layers and one linear output layer, with a regularization factor of 10−10 to reduce overfitting. The message-passing network was iterated 26 times, a choice validated experimentally to ensure sufficient information propagation. The model incorporated 8 attention heads, each with input and output feature dimension of 128 per layer, except for the final output layer where dimensions transformed from 128 × 8 to 128. Training was conducted using the ADAM optimizer with a learning rate of 1 × 10−5 and a dropout rate of 0.6 to improve generalization. These hyperparameters were selected based on preliminary experiments to balance performance with computational efficiency.
5. Conclusions and Future Work
In this paper, we proposed SAT-GATv2, a GNN-based model specifically designed to address the SAT problem. By integrating MPNNs and GATv2, SAT-GATv2 effectively captures both local and global dependencies within SAT formulas. Transforming SAT problems into graph-based structures enables SAT-GATv2 to optimize node representations and model critical dependencies, significantly enhancing SAT-solving efficiency. Experimental results across diverse SAT distributions, including SR(n) and random 3-SAT(n), highlight significant improvements in accuracy (1.75–5.51%) and computational efficiency, underscoring the model’s potential for addressing larger and more complex instances. Compared to traditional solvers, SAT-GATv2 demonstrates distinct advantages by integrating learned embeddings with graph-based reasoning, offering superior adaptability to diverse SAT instances and the capability to capture intricate problem structures. Unlike traditional solvers that rely on fixed heuristic rules and lack adaptability, SAT-GATv2 leverages deep learning to automatically extract features and model structural relationships, enabling better generalization across diverse SAT instances. Additionally, while CDCL solvers encounter challenges when dealing with intricate problem structures, SAT-GATv2’s dynamic attention mechanisms effectively capture complex dependencies, demonstrating clear benefits on datasets like SR(n).
The modular design of SAT-GATv2 further enhances its scalability and flexibility, allowing efficient scaling through the adjustment of message-passing iterations and attention parameters. These characteristics not only improve computational efficiency but also provide a strong foundation for integrating SAT-GATv2 with traditional CDCL heuristics, paving the way for hybrid approaches that combine learned embeddings with heuristic-based methods to achieve better efficiency and accuracy. Despite these advancements, SAT-GATv2 has yet to surpass traditional state-of-the-art solvers, indicating room for further improvement. Future work will expand the dataset scope to include phase transition problems near critical complexity thresholds, particularly those with clause-to-variable ratios of 4.26, as well as real-world SAT instances such as hardware verification, planning problems, and software error detection. These datasets will offer richer challenges, enabling a comprehensive evaluation of the model’s generalizability and robustness.
Additionally, advancements in large language models (LLMs) offer complementary strengths to graph-based approaches like SAT-GATv2. LLMs excel at capturing contextual knowledge and enriching SAT-solving by providing deeper semantic understanding [
50]. For instance, studies like DiLA [
51] show how integrating differential logic layers into LLMs enhances reasoning capabilities, offering valuable insights for hybrid approaches. Inspired by these developments, future research will explore hybrid GNN-LLM models, integrating semantically rich LLM embeddings with GNNs’ structural reasoning capabilities. Such models could provide more effective solutions for problems requiring both contextual understanding and structural reasoning, presenting a promising direction for advancing SAT-solving techniques. Future efforts will also focus on refining the GATv2 mechanisms and streamlining the MPNN process to improve model performance and scalability. Additionally, SAT-GATv2 will be extended to broader combinatorial optimization challenges, including graph coloring and maximum satisfiability. These advancements are expected to broaden the model’s applicability in artificial intelligence and operations research, establishing SAT-GATv2 as a versatile and scalable solution for complex problem domains.