Using SAT Solvers to Reverse-Engineer FSM Models of Digital Devices

Cherepkov, Danil; Mamoutova, Olga; Dojnikov, Anton; Bolsunovskaya, Marina

doi:10.3390/electronics12224680

Open AccessArticle

Using SAT Solvers to Reverse-Engineer FSM Models of Digital Devices

Laboratory “Industrial Systems for Streaming Data Processing”, “Digital Engineering” Advanced Engineering School, Peter the Great St. Petersburg Polytechnic University, Polytechnicheskaya ul. 29, Saint-Petersburg 195251, Russia

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(22), 4680; https://doi.org/10.3390/electronics12224680

Submission received: 23 September 2023 / Revised: 19 October 2023 / Accepted: 22 October 2023 / Published: 17 November 2023

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Inferring a functional specification from an existing digital design is a challenge that is suitable with reverse-engineering methods. One of the most widely used functional specification formats is a finite state machine (FSM). This article studies the possibility of blind passive specification mining for a digital device, where the device is treated as a “black box”. The presented approach treats an input and output signal waveform as the transition graph of an incomplete deterministic FSM and learns the FSM through FSM minimization. It employs a Boolean satisfiability problem (SAT) solver to find a minimal FSM that complies with observed object behavior. The known approach to identifying state machines in discrete event systems is adapted to operate with variables in the form of coloring and transition tables. The developed implementation produces a synthesizable specification in hardware description language (HDL) and a state diagram in unified modeling language (UML). The proposed approach for inferring an FSM from a waveform trace can serve as a supplementary tool during reverse engineering to provide developers with meaningful insight regarding the analyzed device. The presented case study defines metrics of successful FSM inference and applies them to a synthetic FSM and a real-world example FSM to demonstrate the applicability of the approach.

Keywords:

functional specification; FSM; inference; identification; trace analysis

1. Introduction

Reverse engineering (RE) is a development process that applies analysis techniques to an object and produces some form of the object’s specification. Methods of RE include information elicitation, capture, and analysis to help with a system understanding [1]. Apart from providing new insight into the system, the obtained specification may be used in the forward engineering design flow to correct the improper behavior of the system or to update or upgrade the system according to new requirements (see Figure 1). RE is a widely recognized legitimate engineering discipline and is used throughout a system’s life cycle to manage and maintain current projects and to address legacy systems [2].

A specification is a set of documents describing a device’s structure and behavior. When a system under study is an electronic device, RE methods abundantly cover structural description inference [3]. However, inferring a functional specification, which is often needed in addition to a structural specification, is usually done manually. This paper presents the implementation of an approach to automate the retrieval of functional specifications by analyzing signals at the device’s inputs and outputs.

One common type of functional specification in hardware and software design is the finite state machine (FSM). An FSM is a tuple

< X, Y, Q, q_{0}, β, λ >

, where

X

,

Y

and

Q

are the sets of input and output symbols and internal states of a machine,

q_{o} \in Q

is the initial state upon the system reset, and functions

β : X \times Q \to Q

and

λ : X \times Q \to Y

are the transition and output functions correspondingly.

This paper presents an approach to infer a synthesizable description of a deterministic FSM from an analog or digital waveform of input and output signals of a digital device. The idea behind the implemented process is that a long trace can be interpreted as a series of FSM transitions, with each trace step corresponding to a separate FSM state [4]. Then equivalence-based coloring can be found for a series of FSM transitions to provide a minimal FSM. Figure 2, Figure 3 and Figure 4 present an illustrative example of such coloring and minimization.

Identifying an FSM is a known NP-complete problem [5]; hence, one approach to finding a minimal FSM is to use a Boolean satisfiability problem (SAT) solver [6,7,8]. This approach is currently a preferred method of FSM minimization since it provides a guaranteed exact solution and the technology levels allow its computational complexity. Compared to other heuristic approaches [9], the SAT-based method’s inherent ability to identify equivalent states allows “folding” the unfolded behavior of real cycles in the system.

Our implementation of the approach adapts Ulyantsev’s method [10] to digital designs and produces a synthesizable hardware description level (HDL) specification of the FSM and a visual specification in the form of a state transition diagram in unified modeling language (UML). The main aim of this paper is to study the feasibility of this specification’s retrieval approach and its implementation on the example of two digital devices: a synthetic FSM with operational cycle variations and the TCS encoder as a real-world example.

The rest of the paper is structured as follows. Section 2 surveys the existing RE approaches that produce graph models. Section 3 presents our implementation of the SAT-based RE approach for a digital device. Section 4 describes an experimental evaluation of the approach on two example devices. Finally, Section 5 summarizes the contribution of this work and includes a discussion of the method’s scope of use and limitations.

2. Relevant Research

The implemented RE approach performs state machine learning, a classic computer science problem. In a seminal paper, Angluin [11] describes an algorithm for actively learning a deterministic FSM from examples and counterexamples. This L* algorithm is still the basis for active automata learning methods. Lee and Yannakakis [12] and Kudryavtsev et al. [13] provide a thorough review of automata identification by active experiments (automata testing). Another body of theory covers the passive automaton identification, given the example of its behavior, by reducing its length [4,5,14].

The lack of system specifications, namely functional specifications, highlights the relevance of automata learning for practical applications in different engineering fields and has driven a significant body of recent research. The following literature review covers the recent publications on the topic.

One research direction in model mining employs artificial intelligence techniques, namely machine learning. Machine learning algorithms allow the replication of a system function in the form of a model, which takes the form of a neural network trained on samples of the system’s behavior [15]. Despite the flexibility of the approach, its black-box nature due to the inherent lack of explicitly specified functional behavior makes this kind of method inapplicable for the digital design flow, where human insight into a function is necessary. Hence, we did not include machine learning methods in the following review.

Software systems RE solves a wide range of tasks, including specification mining in general [16,17,18], verification [19], and model checking [20]. Some approaches target specific challenges, such as malware similarity analysis [21]. The resulting graph models are usually in the form of classic or extended state machines [22,23], Petri nets [24], or software models such as message sequence graphs [25]. Passive analysis techniques include pattern matching [23], the k-Tail algorithm [22,26], information mapping [20], similarity analysis [21], and state merging [18].

Protocol RE is where the software under study is known to have a state machine implementation [27,28]. For example, the ReverX tool [29] performs the following steps. It conducts automatic protocol language inference, derives a protocol state machine by clustering similar messages, and—for all observed network sessions—builds a prefix tree acceptor, which is a tree-like state machine. It uses a heuristic approach to merge equivalent states. ReverX is an example of a passive analysis approach. Protocol RE approaches predominantly use active analysis [30]. For example, Zhang et al. [31] employed the query-driven state-merging algorithm (QSM) to mine protocol specifications, and the flexfringe tool [32] uses an evidence-driven blue-fringe state-merging algorithm.

Embedded system RE focuses on the electronic system level. Tsai et al. [33] propose deriving a high-level control flow structure of a parallel program from the traces intended for the trace-driven simulators. Iegorov and Fischmeister [34] address the problem of interleaving independent system activities in execution traces. They propose mining task precedence graphs (TPGs) by creating a complete set of occurrence patterns, building a TPG conformant to one of the traces, and then training the TPG on the remaining traces. Both approaches only consider the software level of an embedded system. Jeppu et al. [35] address the problems of hardware and software co-design. They propose a new model learning method that integrates a SAT-based approach with program synthesis techniques to capture the transaction level of the system’s behavior. This approach uses instrumented source code with print statements to produce a trace, which makes it inapplicable for a “black box” RE object.

RE methods for hardware elements of electronic systems predominantly consider an object as a “black box” and operate with sequences of discrete values. Li et al. [36] address the problem of digital design validation. Their approach derives the high-level functional description of a gate-level netlist by identifying the elements and finding their closest matches in the component library. Estrada-Vargas et al. [37] propose a heuristic approach to recover Petri net models for programmable logic controllers (PLC). Wu and Dai [38] derive a finite-state model in IEC 61499 format using correlation analysis preceded by data type identification and input–output pair matching. Chivilikhin et al. [39] also derive a finite-state model in IEC 61499 format but with a SAT-based approach. We adopt a modification of the latter approach to obtain a synthesizable FSM description of a discrete event system.

3. FSM Identification Method

The main idea behind the method in the presented RE implementation is to assign the same color to equivalent vertices of the trace graph. Merging the vertices with the same color will produce a smaller graph. The state machine, represented by this smaller graph, with several states equal to the number of colors, will satisfy the given trace. When the number of colors used is minimal, the resulting FSM will be minimal. Being NP-complete, this FSM minimization task is reducible to a SAT.

Based on this approach, the presented RE implementation includes the following processing steps to infer an FSM:

After observing a system, an obtained trace is interpreted as an unminimized FSM and transformed into a set of Boolean clauses.
A SAT solver is run iteratively with a set of generated clauses to obtain a minimal FSM.
An obtained solution is interpreted as an FSM to generate a synthesizable FSM specification and a corresponding graphical specification.

We will illustrate the process on an example trace with

n = 8

initial states, with

m = 3

being the number of states after minimization, and

k = 4

being the range of input symbols

X

.

3.1. Trace Format

A trace is a digitally stored representation of observed events in a system’s behavior. To obtain a trace, an object system often needs to be appropriately instrumented to provide the means for further monitoring of the system events. A particular type of event in a trace and its notation vary depending on the object type and objectives of the analysis. In addition, when the pursued specification reflects the temporal properties of the system, events will be explicitly or inherently annotated with timestamps.

For example, an event in a software execution trace can be a log message about a function/method call or changes in observable system variables and parameters. Log messages are strings with arbitrary notations. For grammar, an event is an input symbol of an accepted sequence, like in the Abbadingo format. For a network protocol, an event in a communication trace can be a packet or a message, often in the packet capture format (.pcap).

For a digital device, a trace may take the form of a chronologically ordered sequence of

< X_{i}, Y_{i} >

pairs, where

X_{i} \in X

and

Y_{i} \in Y

are the tuples of the observed input and output values in discrete time. Such a trace can be obtained with an oscilloscope or a logic analyzer, which allows monitoring and measuring electrical signals in a circuit. Connecting the measuring equipment to the inputs and outputs of the RE object provides the necessary observations of the system’s behavior. After a long enough observation, the input and output values can be recorded as an array of input and output symbol pairs.

The measuring equipment can usually convert the sampled raw dataset into a comma-separated value format (.csv), which is convenient for storing and processing trace information. The current method implementation takes a trace as a .csv file, and then, in the case of analog measurements, the signals are further converted into a digital form by quantization, limiting the range of

X

and

Y

to a finite number of symbols.

These values represent the transition path of the FSM, reflecting the system’s operation during a specified period. Each pair in a trace describes one state and one transition in a trace graph (see Figure 2 for an illustrative example of such a graph).

This automaton can be interpreted as either a Moore or a Mealy state machine. In a Moore FSM, the output value is a function of the current state:

λ : Q \to Y

, which makes the FSM analysis much simpler. Because the Moore machine can be easily transformed into an equivalent Mealy machine, this way of trace interpretation does not limit the resulting specification. Henceforth, traces will be interpreted as transitions of a Moore FSM.

3.2. SAT Solver as a Tool for SAT Problems

SAT problem determines whether a given Boolean expression has any variable value combinations that make the expression evaluate to 1 (TRUE, or “satisfied”).

Methods for solving the SAT include both automatic algorithms and manual methods. Some of the commonly used automatic algorithms include backtracking, constraint satisfaction, and conflict-driven clause-learning algorithms. In addition to these algorithms, manual methods, such as branch-and-bound and reduction methods, are also available. While these algorithms and methods can effectively solve SAT problems, they may not be powerful enough for a problem with a large number of variables. For solving such complex problems, SAT solvers are used, providing a more efficient and accurate solution than manual methods. In particular, the RE implementation presented in this paper uses a SAT solver to solve problems.

These tools use branching algorithms and assumptions. The value of a variable is assumed in advance, which is determined in the first steps, and then there is a branching of possible options if the assumption is incorrect. Then the solver takes a step back and builds another assumption, and so on, until it finds a solution for the entire problem. For the solver to work, it is necessary to correctly convey to him the description of the problem in a language that is understandable to him. In most cases, the DIMACS (Center for Discrete Mathematics and Theoretical Computer Science) format is used, which is the standard way to store the conjunctive normal form (CNF). This is a textual format where the first line stores information about the problem, the number of variables, and the number of clauses, followed by lines with clauses. The solution of the SAT solver is a string listing all the variables with or without negation, where a variable with negation denotes a FALSE, and a positive one denotes a TRUE, depending on the value of the variable that satisfies the problem.

The presented RE implementation utilizes CryptoMiniSat [40] as a solver of choice. This solver has often won competitions among other solvers, showing the best performance, and can easily be invocated into Python 3 code. Other solvers can also be used. For example, solvers such as MiniSat [41] and PySAT [42] can be used as Python 3 wrappers. However, those wrappers require elaborate settings. Moreover, their improved performance covers other solver functions not utilized in the implemented method.

Another alternative approach to state machine minimization is a reduction of the problem to the constraint satisfaction problem (CSP). CSP is a more general problem than SAT since it operates with sets of variables and a larger set of values than the SAT. For the task of FSM minimization, the syntax and the possibilities of CSP are exhaustive, and we consider them redundant for the task of digital device RE.

3.3. Preparing Data for the SAT Format

To bring an FSM minimization problem to a SAT problem, two groups of Boolean variables should be defined:

C = (A, B)

. Those variables define the graph structure of a minimal FSM: the first group

A

defines the state’s equivalency, while the second group

B

defines the transitions between the states of a minimal FSM. Their concatenation

C

forms the variables in the CNF for a SAT solver. The lists of variables are created automatically: the number of variables and their corresponding indices are defined by the trace length n, the target minimal number of states m, and the number k of input symbols

X

.

For illustrative purposes, the variables can be organized in the form of two tables: the coloring table with variables

A

(see Table 1) and the table of transitions of the FSM with variables

B

(see Table 2).

The coloring table (Table 1) has dimensions

n \times m

. Integer values in the cells are the indices s of the variable

C_{s} = A_{i, j}

, indicating which state

Q_{i}

of the trace is matched as equal with which state

Q_{j}^{*}

of the minimal FSM. Here,

i \in [0 \dots n - 1]

is the number of a state in a trace (rows), and

j \in [0 \dots m - 1]

is the number of a state in a minimal FSM (columns).

The transitions table (Table 2) represents all possible transitions in a minimal FSM and has dimensions of

m^{2} \times k

. The integer values in the cells here are the indices s of the variable

C_{s} = B_{t, x}

, indicating the presence of a transition t or its absence upon the x input symbol in the minimized FSM. Here,

t = {j^{'}, j^{″}} \in [0 \dots m^{2} - 1]

represents the number of a transition between the states with indices

j^{'}

and

j^{″}

(rows), and

x \in [0 \dots k - 1]

represents the number of an input symbol from

X

(columns).

In this example, cells of both tables are filled with indices of

C

in ascending order. For s being the index of the variable from

C

, the following is a numbering scheme. For the coloring table, where indices

s \in [1 \dots n \cdot m]

:

s = 1 + i \cdot m + j,

(1)

and for the transitions table, where

s \in [n \cdot m + 1 \dots n \cdot m + m^{2} \cdot k]

:

s = (n \cdot m + 1) + t \cdot k + x = (n \cdot m + 1) + (j^{'} \cdot m + j^{″}) \cdot k + x .

(2)

For the SAT solver, the meaning of the variables is insignificant since the solution does not depend on it, and the solver operates only with the indices of the variables. However, the correspondence between the variable index and its meaning in the definition of the FSM is necessary for the automatic creation of all the conditions and further interpretation of the SAT solution.

3.4. Construction of the FSM Minimization Problem

The data structure is presented in a convenient form of two sets of logical variables and can be used to build the constraints that form a SAT problem. The clauses are built automatically using an adapted and modified version of the approach originally described by Ulyancev et al. [10]. The modification presented here allows the solution to be interpreted as a synthesizable hardware implementation of the FSM. All the clauses are grouped by the type of condition that restricts the graph of the minimal FSM.

Condition 1: Each state from the trace must be defined as one of the states of the minimal FSM. This condition applies to each state of the trace FSM and requires at least one TRUE value in the corresponding row of the coloring table.

A_{i, 0} \lor \dots \lor A_{i, m - 1} : \forall i \in [0 \dots n - 1] .

(3)

Condition 2: Each state from the trace must be defined as only one of the states of the minimal FSM. This condition also applies to each state of the trace FSM but requires that only one value in the corresponding row of the coloring table be TRUE.

\neg A_{i, j^{'}} \lor \neg A_{i, j^{″}} : \forall i \in [0 \dots n - 1]; \forall j^{'}, j^{″} \in [0 \dots m - 1]; j^{'} < j^{″} .

(4)

Condition 3: The output symbols of the states of the minimal FSM must be the same as the output symbols of the states from the trace, according to their coloring. This is done by comparing the output values

Y_{i}

in the trace with the corresponding output values in the coloring table and ensuring their compliance. There should be no pairs

(i^{'}, i^{″})

of states in the trace with the same coloring j but different output symbols.

\neg A_{i^{'}, j} \lor \neg A_{i^{″}, j} : \forall j \in [0 \dots m - 1]; \forall i^{'}, i^{″} \in [0 \dots n - 1]; i^{'} < i^{″}; Y_{i^{'}} \neq Y_{i^{″}} .

(5)

Condition 4: The transition from the state upon a symbol x must be unambiguous, i.e., the minimal FSM must be deterministic.

\neg B_{j, j^{'}, x} \lor \neg B_{j, j^{″}, x} : \forall x \in [0 \dots k - 1]; \forall j, j^{'}, j^{″} \in [0 \dots m - 1]; j^{'} < j^{″} .

(6)

Condition 5: Each transition of the trace must be defined in a minimal FSM with a corresponding condition. Here,

x (i)

is the condition for the transition between the i and

i + 1

states of the trace, and

j^{'}, j^{″}

denote the coloring of the states.

B_{j^{'}, j^{″}, x (i)} \lor \neg A_{i, j^{'}} \lor \neg A_{i + 1, j^{″}} : \forall i \in [0 \dots n - 2]; \forall j^{'}, j^{″} \in [0 \dots m - 1] .

(7)

Condition 6: If the transition is not defined in the original FSM, its definition in the minimal FSM must be undefined as well.

\neg B_{t, x (i)} \lor \neg A_{i, j} \lor A_{i^{'}, j^{'}} : \forall i, i^{'} \in [0 \dots n - 1]; \forall j, j^{'} \in [0 \dots m - 1]; i \neq i^{'}; j \neq j^{'} .

(8)

All the logical variables in the clauses are substituted with the corresponding variables from

C

. Then all the generated clauses are combined into a CNF by conjugating all the clauses with a logical AND to form a description of a SAT in the DIMACS format. To solve such a problem, the solver searches for the values of the logical variables that give the CNF a TRUE value. When such values are found, the problem is considered to be solved.

3.5. SAT-Based Trace Processing

The presented RE implementation performs an automatic search of the number of states m in a minimal FSM using binary search. This algorithm has logarithmic search time and provides the best performance compared to other methods, such as linear search, which has a linear search time. The preference for binary search stems from its efficiency and speed when dealing with sorted arrays, such as when searching for the number of states where the possible state counts are organized in ascending order.

Due to the nature of the FSM minimization problem, the maximum number of states is limited by the number of pairs in the trace (n), and the minimum is limited by the number of unique output values (

| Y |

). The target number of states is found via binary search, where each selected state count is to be checked by constructing the variable tables and defining the CNF, and then the SAT problem is tackled using a solver. The implemented search strategy starts with the simplest option when the number of states equals the number of different output symbols. If, under such conditions, the SAT is not resolved, then the search continues by choosing the median value in the range between the determined minimum and maximum values. Otherwise, if a solution is found, the search for the answer continues in the range between the minimum and the last tested median, adhering to the binary search principle. The search for the optimal value of m continues until the following condition is met: the SAT problem becomes unsolvable for a state count just one unit greater than a previous count. This process yields a minimal FSM that adheres to the logic of the recovered device. Once the range of possible values for m reaches a certain point, the search ends; the range is small enough that the median is only one less than the maximum possible value. Depending on whether or not a SAT solution exists for m equal to the median, the final number of states for the minimal FSM will be either this median value or the previously established maximum.

In summary, the described procedure aims to find the optimal value for m, which yields the minimal possible number of states needed to accurately represent the behavior of the original device. This is accomplished by iteratively exploring the range of possible m values, narrowing it down as new information arises, until a solution meeting all the constraints set by the SAT problem is attained.

3.6. Specification Generation

At this stage, it is necessary to correctly interpret the output of the solver, according to the converted tables from the previous stage. The obtained variable values should be interpreted according to their positions in the tables to determine the coloring of the states and the definition of state transitions. When logic 1 is present at the intersection in the coloring table, it means that the original state corresponds to this target state. In the transition table, the presence of logic 1 at the intersection means that there is a transition between the states by the input symbol.

The next step is to read the information from the tables and generate an HDL description and UML using a template. The code generator is scripted in Python 3. To depict the synthesized FSM, the SystemVerilog (.sv) language is used. The state machine itself is delineated through the utilization of two automatic behavioral blocks according to the prepared FSM template taken from [43]. When elucidating UML constructs, the syntax of the PlantUML language is adopted. This syntax seamlessly translates into a visual rendering available in both vector and raster formats, facilitating the representation of intricate UML diagrams. Figure 5 presents a UML diagram for the illustrative FSM example (see Figure 4).

4. Experimental Evaluation

An experimental study was conducted to confirm the validity of the method and its implementation when applied to a trace of a digital device with cyclic behavior. The goal of the study was to show that the inferred FSMs accurately specify the observed functions of the systems.

The experiments follow the same three-step procedure:

Inference—for a given trace of an object device, a solution is obtained.
Test—to ensure that the identification procedure is correct, the behavior of the obtained FSM over the input values in the trace is compared to the output values in the trace, thus comparing expected and inferred behaviors and showing the correctness of the solution.
Verification—the obtained FSM is run over the variations of the trace to check that the inferred FSM behaves as expected with data previously not seen.

4.1. Experiment Parameters

When performing a behavior analysis for a system, it is crucial to ensure that the collected trace reflects the functions of the system that need to be specified. If the object exhibits behavioral variations, the trace should be gathered for an extended time period to capture these changes. It goes without saying that the trace must cover at least one full cycle to provide an accurate representation of the system’s function. Thus, the observer conducting the analysis of a system is expected to have some understanding of the system and is liable for the trace representativeness.

With these assumptions, the trace can be considered as a sequence of vertices and edges of a graph (a walk) over the state graph of the target FSM; that is, with states being vertices and transitions being edges of the graph. The walk itself can be open or closed. For a simple device, the walk will consist of the same single repeating closed sub-walk corresponding to a normal operational cycle of a device.

However, depending on the number of variations of the cyclic behavior, the particular walk can include multiple various closed sub-walks. Then, depending on the complexity of a particular graph, the walk will be of extended length relative to the length of a device’s entire normal operational cycle.

An FSM inference method should provide accurate results and consider all the aforementioned scenarios. However, while the trace length grows linearly, the convergence time grows exponentially, which can make the inference unattainable. And this poses the question of whether some additional analysis on the trace chunks can be performed in order to make the inference less computationally hungry with the same accuracy.

For a particular trace, L is the whole walk length and

L_{0} < L

is a minimal closed walk that covers 100% of an operational cycle. Then the trace chunk to be used for inference should have a length of

n = L_{0} \cdot w

, where

w \geq 1

. A variation of a walk can be defined with an offset l from the beginning of the trace. With v being the count of the minimal walk variations, the experiment parameters are represented with the following set:

L^{v} = {L, L_{0}, l, w} .

(9)

The inference and test steps are performed on an

L^{v_{0}}

, while the verification step is performed on a

{L^{v^{'}}}

, with

v_{0} \neq v^{'}

.

4.2. Accuracy Evaluation

We will use the term “accuracy” to refer to the equivalence of a specification to an example of the observed object behavior, as a qualitative measure of the RE results.

The equivalence of the inferred solution to the original FSM represented by a trace variation is inherent to the method. FSM minimization is achieved by a proper set of clauses fed to the SAT solver. The equivalency is checked at the test step using the same trace variation as at the inference step.

The confidence of the inference is determined by the accuracy of the solution and must be evaluated at the verification step. The solution is accurate only if it is equivalent to all the validation traces: the inferred FSM should match the observed behavior over the whole trace.

By definition, this verification task can be reduced to the problem of an observational equivalency check between

L^{v_{0}}

and

L^{v^{'}}

. Another way would be to perform passive FSM testing with initial state uncertainty. Both approaches are very well studied in the literature. However, for the sake of the experiment, we assume that a system observation allows the detection of the initial (idle) state

q_{0}

in the trace. Indeed, the idle or reset state of the system can be expected to be identifiable. Then the corresponding states in

L^{v_{0}}

and

L^{v^{'}}

traces should be known and can be marked as such.

Such convergence analysis at the verification step allows assessment of the operational cycle coverage, i.e., the sufficiency of the trace.

4.3. Experiment Framework

The experimental framework uses the method’s implementation, which includes a set of programs supporting the SAT-based FSM inference. At the first step of the experiment, it is necessary to determine the format of the initial data. If the trace has already been obtained, the stage of trace generation can be omitted. Otherwise, the trace_builder will create a trace from the waveform. The trace_builder quantizes an analog signal and decimates the sample rate to allow the signal representation in a trace format. The digital waveform representation mapped to a finite number of states enables the testbench to compare the behaviors of the generated FSM and the original system.

Then the trace goes to the SAT_invocator, where the tables described above are constructed (see Table 1 and Table 2) and clauses are formed on their basis using conditions to form SAT problems and find a minimal FSM.

The obtained solution goes to the FSM_generator, where the construction of the minimal FSM in the SystemVerilog language and PlantUML [44] takes place using predefined templates. The SystemVerilog template describes the code structure of an FSM state and transitions from this state. The FSM_generator is a Python script that takes the information about states and transitions of the FSM, and (in a stepwise manner) generates the corresponding code structure for each state. To describe a state and a transition in PlantUML language, two lines of code are needed, which take the number of a state and its output and input values. The generation of the PlantUML description is performed in the same stepwise manner.

To check the accuracy and reliability of the RE implementation, a SystemVerilog testbench was created for the test and verification steps of the experiment. The testbench collects the transition data from the file “trace.csv”, which are used for an FSM inference. This is followed by the verification, during which the behavior of the inferred FSM is tested and verified. This comparison is carried out through the analysis of the output data from the trace and the reactions of the FSM to the input data coming into it. Next, the SystemVerilog testbench receives input signals and sends them to the FSM to verify the equivalence of its output signals. The results of the comparison indicate the equivalence of the original FSM and the resulting FSM. Figure 6 provides an overview of the framework.

4.4. Experimental Study

4.4.1. Synthetic FSM as a White Box

To evaluate the effectiveness and verify the functionality of the proposed method, a test FSM consisting of seven unique states was developed. A test plan was formulated; the task was to obtain a complete sequence of transitions describing the FSM in the form of a set of 37 pairs of input and output values. This number of pairs covers all the possible transitions in this FSM, including a return to the initial state, from which the recording of this sequence began. Each pair represents 1 of 37 states, and for each state, only the inputs and corresponding outputs are known.

The next step is to create tables, 37 × 7 and 49 × 8 in size, using the algorithm described in Section 3. The algorithm, which is based on the binary search method, begins the search with the minimum possible number of states (Section 4.3). Since the experiment uses an FSM with 7 unique outputs, the search begins with this number. If the number of initial states was higher, the search would continue further. With the help of these tables and the code discussed above, CNF for the SAT solver was formed, which successfully found a solution for the given expressions. The solution indicates that an FSM has been found whose behavior will replicate the functions of the synthetic one. The verification of the found solution was made by the signal comparison in the testbench, which showed identical behavior.

For further analysis, the trace was increased by three times to assess the impact of data redundancy on FSM recovery. As a result, the state machine turned out to be identical to the original one, which confirms that increasing the length of the trace does not affect the recovery result.

Next, an experiment was conducted using a different section of 37 pairs from a three-fold extended walk, with a modified order of transitions. The analysis of this walk led to the decision to move the window forward by 3 and 34 pairs; this is demonstrated in Figure 7. The presented method implementation currently assumes that the trace always starts from an initial state, to which the system returns upon reset (see Section 4.2). The shift values were selected correspondingly as reference positions of initial states in the trace, where the FSM walk eventually returned. As a result, the minimal FSM obtained was identical to the original one, except for the numbering of states.

The experiment confirmed that the proposed method has high stability and reliability when restoring FSM. Regardless of the specific sequence of transitions, the method is able to correctly determine the states and transitions between them in the original FSM.

In this series of experiments, while analyzing a synthetic FSM with predefined states and transitions, the inference results were compared with the expected outcome to verify the inference procedure’s accuracy. Also, the complexity of the synthetic FSM was employed to estimate the implementation’s efficiency.

4.4.2. TCS Encoder as a Black Box

An example trace obtained from a TCS encoder was analyzed and compared to the results of manual RE to showcase the method’s implementation abilities in a real-world design scenario. The TCS encoder is an element of the trigger control subsystem (TCS) in a legacy data acquisition system; it was built in 2000 for a COMPASS experiment at CERN. The TCS encoder has two input channels (channels A and B), where it encodes the data into a single bitstream and puts it out to the optical fiber synchronization network. An in-depth analysis of the documents on TCS revealed that the TCS encoder performs bi-phase mark coding for time-multiplexed data from channels A and B. However, for the sake of the experiment, the device was treated as a black box.

After having collected data using an oscilloscope, the signals required quantization and a decimation procedure. At the end of it, a list of pairs was received, which can be interpreted as a trace. An array of a specific length was also produced, depending on the “depth” of the sampling. In the particular example, a sampling technique was employed to immediately eliminate possible unnecessary cycles within one of the possible states in a finite state machine that describes the overall behavior of a device. The result was a trace with a length of exactly 117 pairs.

After having gone through the necessary transformations, a trace was obtained that is well-suited for the method. No significant differences were observed compared to the scenario with a synthetic/artificial FSM. Only one assumption was made: that this fragment of a waveform contains enough information about the observable behavior of the given device and that there are no noise-infested areas within it. Thus far, the stages of the experiment have not differed from those of working with a “white box”. The only deviation arose from the need to locate the starting point of the window through which the walk of the given FSM passes via the iterative method. Within the first scenario, the trace was not subjected to any alterations and was fed as the input of the solver/method in the form of larger tables, after having passed through all of the conditional templates and substituting Boolean variables.

At the selected sampling frequency, a minimal FSM with four states was restored. To make sure that the FSM was correctly inferred, it was tested with the selected sample trace and the original trace in the testbench. The steps of the performed experiment are presented in Figure 8, showing a section of the waveform with the A, B, and output signals, and an overview of the inferred FSM.

5. Discussion

The presented experimental results show that the SAT-based approach can be successfully used to retrieve the functional specification of a digital device with observable inputs and outputs. The method can be useful in different stages of a design flow. The first experimental study focuses on specification retrieval for a synthetic FSM, treating it as a “white box” design, which exemplifies a current design at the development or maintenance stages. The second experimental study describes a real practical case of specification retrieval for a TCS encoder as a legacy “black box” design.

In a “white box” scenario, the object under study would be a design that is developed or maintained, with a known implementation, but with the specification lacking or not properly updated. In common engineering practices, complex designs usually comprise a hierarchy of smaller design units resembling small-scale discrete systems. In particular, an FPGA-based design is an array of hierarchically connected modules with certain functionality. In such cases, the approach can be used to assist the design process by inferring an additional graphical view of a module function to verify an understanding of the module behavior or for troubleshooting. For such a “white box” scenario, the solution can be regarded as successfully converged if the number of deduced states (m) equals the predefined number of states in a tested design (

m_{0}

). If m happens to be less than

m_{0}

, the designer can conclude that either the trace is insufficient and some test cases have not been covered, or that the design is redundant and can be optimized. With a predefined notion of the design, FSM inference can be iterative, until the required specification is obtained.

Compared to the “white box” scenario, a “black box” scenario poses a few challenges regarding system observations, trace preparation, and inference result interpretation. In the “black box” scenario, the object under study is typically a legacy system that does not have a specification or has conflicting specifications. In this case, the approach is able to produce a synthesizable description of the device, according to the provided traces of its behavior. The following discussion covers the limitations and vulnerabilities of the implemented RE approach in this “black box” scenario.

One limitation of the approach is that it can only use historical trace data and offline data processing due to the iterative search nature of the algorithm and the computational demands of its implementation. Although the process guarantees that it will eventually find the correct solution, it is crucial to plan the experiment to obtain the trace beforehand.

The following two are the limitations of the solution itself. The number of inferred states and definitions of transitions in the FSM may not fully describe the original system.

The results of any inference are limited by the scope of observation. To obtain a solution that reliably exhibits the intended behavior, sufficient data must be collected. This means that the observation time should cover all the characteristic time intervals with all possible scenarios of the device use. Traces should be collected only in environments and under operating conditions, in which appropriate and adequate observations can be conducted. Only then will the expected outcome cover the required significant aspects of the system’s behavior. Ideally, the test should contain all scenarios under RE study, i.e., scenarios where the new system is planned to be used, including rare events.

Such a rigorous exercise to obtain extensive traces is often unfeasible, and some behaviors will remain unobservable. Indeed, a proper design implements a set of safe functions to handle unexpected or hazardous conditions in the system environment. The number of safe functions and their corresponding conditions are unknown and are likely to be left out of the observations. Although this limits the completeness of RE, it does not interfere with a typical reengineering design flow. Based on a correctly obtained specification of the main function, the design will be augmented with new requirements, typically including a more current safe behavior description. This approach is called a best-effort approach [30]. It should be noted that the approach does not allow the identification of invisible states and transitions in a complex system analysis.

Another characteristic of the solution is that a recovered FSM is incomplete. However, a typical state machine in a digital device is designed to be incomplete, meaning that the automaton usually expects a certain sequence of input events to perform the transitions and may ignore irrelevant events. For example, to provide this kind of safe behavior, the presented implementation of the approach adds self-loop transitions for the obtained solution so that the resulting specification does not contain unintended functionality.

Furthermore, it is important to note that although the studied approach is considered black-box testing, some knowledge of the system is still required. It is assumed that the target object has a known interface and discrimination between input and output signals can be made. It is also assumed that a proper monitor placement in a working environment can be made to derive examples of system behavior. Moreover, for efficient accuracy evaluation, the knowledge of an initial state should be available.

Another known vulnerability of the presented approach is the inability to estimate the accuracy of the solution in the presence of erroneous traces. This is a known problem in the field of RE [45] that is not addressed in this publication.

Finally, the method implemented in this article is most appropriate for interfaces with discrete signals when a single signal change manifests an input or output event. If the interface performs any data or command encapsulation, the range of input and output symbols will grow exponentially, and generated specifications will become incomprehensible, which nullifies the expected benefits of the RE.

In conclusion, SAT solvers can be recommended to obtain functional specifications of small-scale digital devices. This method can be a useful design assistant in both forward and reengineering design flows. The most significant advantage of this method is that it relies on the passive system observation of the system and only requires a signal monitor to be placed correctly to sample device inputs and outputs. The specification is then produced offline based on the gathered system traces. At the same time, the passive nature of the approach results in its main limitation: the inference results are bounded by the scope of observation; hence, the observations should cover all the significant aspects of the system’s behavior. This requires the experiments to be carefully planned.

The implementation of this method also made some assumptions, which could be the subject of further research. The applicability is limited to designs that have simple discrete interfaces with known discrimination between input and output signals. Also, the traces were assumed to have no sampling or quantization errors.

Author Contributions

Conceptualization, O.M.; methodology, O.M. and D.C.; software, D.C.; validation, D.C.; formal analysis, D.C.; investigation, D.C.; resources, M.B. and A.D.; data curation, D.C.; writing—original draft preparation, D.C. and O.M.; writing—review and editing, O.M. and D.C.; visualization, D.C.; supervision, M.B.; project administration, A.D.; funding acquisition, M.B. All authors have read and agreed to the published version of the manuscript.

Funding

The research is funded by the Ministry of Science and Higher Education of the Russian Federation as part of the World-Class Research Center program: Advanced Digital Technologies (contract no. 075-15-2022-311, dated 20 April 2022).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The method’s implementation and the experimental data can be found at https://github.com/Danil891/FSM_SAT (accessed on 20 October 2023).

Acknowledgments

We would like to express our sincere gratitude to Alexey Filippov for the fruitful discussions at the conceptualization stages of this research. We would also like to thank Elizaveta Nikulina for the diligent proofreading and Anna Boiko for the invaluable administrative support.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

CNF	conjunctive normal form
DIMACS	Center for Discrete Mathematics and Theoretical Computer Science
FSM	finite state machine
HDL	hardware description language
RE	reverse engineering
SAT	Boolean satisfiability
TCS	trigger control subsystem
UML	unified modeling language

References

Feiler, P.H. Reengineering: An Engineering Problem; Technical Report; Defense Technical Information Center: Fort Belvoir, VA, USA, 1993. [Google Scholar] [CrossRef]
Romanski, G.; DeWalt, M.; Daniels, D. Reverse Engineering for Software and Digital Systems; Technical Report DOT/FAA/TC-15/27; Federal Aviation Administration USA: Washington, DC, USA, 2016. [Google Scholar]
Fyrbiak, M.; Strauß, S.; Kison, C.; Wallat, S.; Elson, M.; Rummel, N.; Paar, C. Hardware Reverse Engineering: Overview and Open Challenges. In Proceedings of the 2017 IEEE 2nd International Verification and Security Workshop (IVSW), Thessaloniki, Greece, 3–5 July 2017; pp. 88–94. [Google Scholar] [CrossRef]
Kella, J. Sequential Machine Identification. IEEE Trans. Comput. 1971, C-20, 332–338. [Google Scholar] [CrossRef]
Gold, E.M. Complexity of Automaton Identification from Given Data. Inf. Control 1978, 37, 302–320. [Google Scholar] [CrossRef]
Heule, M.J.H.; Verwer, S. Exact DFA Identification Using SAT Solvers. In Grammatical Inference: Theoretical Results and Applications; Sempere, J.M., García, P., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2010; pp. 66–79. [Google Scholar] [CrossRef]
Avellaneda, F.; Petrenko, A. FSM Inference from Long Traces. In Formal Methods; Havelund, K., Peleska, J., Roscoe, B., de Vink, E., Eds.; Springer International Publishing: Berlin/Heidelberg, Germany, 2018; Volume 10951, pp. 93–109. [Google Scholar]
Abel, A.; Reineke, J. MEMIN: SAT-based Exact Minimization of Incompletely Specified Mealy Machines. In Proceedings of the 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Austin, TX, USA, 2–6 November 2015; pp. 94–101. [Google Scholar] [CrossRef]
Solé, M.; Carmona, J. Region-Based Foldings in Process Discovery. IEEE Trans. Knowl. Data Eng. 2013, 25, 192–205. [Google Scholar] [CrossRef]
Ulyantsev, V. Generation of Finite State Machines Using Software Tools for Solving Satisfiability Problems and Satisfying Constraints. Ph.D. Thesis, ITMO University, Saint Petersburg, Russia, 2015. [Google Scholar]
Angluin, D. Learning Regular Sets from Queries and Counterexamples. Inf. Comput. 1987, 75, 87–106. [Google Scholar] [CrossRef]
Lee, D.; Yannakakis, M. Principles and Methods of Testing Finite State Machines. A Survey. Proc. IEEE 1996, 84, 1090–1123. [Google Scholar] [CrossRef]
Kudryavtsev, V.B.; Grunskii, I.S.; Kozlovskii, V.A. Analysis of Behaviour of Automata. Discret. Math. Appl. 2009, 19, 1–35. [Google Scholar] [CrossRef]
Oliveira, A.; Edwards, S. Inference of State Machines from Examples of Behavior; Technical Report UCB/ERL M95/12; EECS Department, University of California: Riverside, CA, USA, 1995. [Google Scholar]
Redman, D.; Ward, D.; Carrico, M. AFE 87—Machine Learning; Final Report; Technical Report 87-REP-01; Aerospace Vehicle Systems Institute: College Station, TX, USA, 2020. [Google Scholar]
Schmidt, L.; Narayan, A.; Fischmeister, S. TREM: A Tool for Mining Timed Regular Specifications from System Traces. In Proceedings of the 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), Urbana, IL, USA, 30 October–3 November 2017; pp. 901–906. [Google Scholar] [CrossRef]
Pastore, F.; Micucci, D.; Guzman, M.; Mariani, L. TkT: Automatic Inference of Timed and Extended Pushdown Automata. IEEE Trans. Softw. Eng. 2022, 48, 617–636. [Google Scholar] [CrossRef]
Mousavi, A.; Far, B.H. Harnessing Overgeneralization in the Synthesis of State Machines from Scenarios. In Proceedings of the 2008 Canadian Conference on Electrical and Computer Engineering, Niagara Falls, ON, Canada, 4–7 May 2008; pp. 001107–001112. [Google Scholar] [CrossRef]
Majma, N.; Babamir, S.M.; Monadjemi, A. Runtime Verification of Pacemaker Using Fuzzy Logic and Colored Petri-Nets. In Proceedings of the 2015 4th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS), Zahedan, Iran, 9–11 September 2015; pp. 1–5. [Google Scholar] [CrossRef]
Junior, M.A.d.O.; Ribeiro, L.; Duarte, L.M.; Cota, E. Specification of Models Based on Contexts using Graph Grammars. In Proceedings of the 2013 2nd Workshop-School on Theoretical Computer Science, Rio Grande, Brazil, 15–17 October 2013; pp. 129–134. [Google Scholar] [CrossRef]
Blokhin, K.; Saxe, J.; Mentis, D. Malware Similarity Identification Using Call Graph Based System Call Subsequence Features. In Proceedings of the 2013 IEEE 33rd International Conference on Distributed Computing Systems Workshops, Philadelphia, PA, USA, 8–11 July 2013; pp. 6–10. [Google Scholar] [CrossRef]
Le, T.D.B.; Le, X.B.D.; Lo, D.; Beschastnikh, I. Synergizing Specification Miners through Model Fissions and Fusions (T). In Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), Lincoln, NE, USA, 9–13 November 2015; pp. 115–125. [Google Scholar] [CrossRef]
Mahato, P.K.; Narayan, A. MINTS: Unsupervised Temporal Specifications Miner. In Proceedings of the 2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS), Hainan, China, 6–10 December 2021; pp. 841–851. [Google Scholar] [CrossRef]
Lu, T.; Liu, C.; Duan, H.; Zeng, Q. Mining Component-Based Software Behavioral Models Using Dynamic Analysis. IEEE Access 2020, 8, 68883–68894. [Google Scholar] [CrossRef]
Kumar, S.; Khoo, S.C.; Roychoudhury, A.; Lo, D. Mining Message Sequence Graphs. In Proceedings of the 2011 33rd International Conference on Software Engineering (ICSE), Honolulu, HI, USA, 21–28 May 2011; pp. 91–100. [Google Scholar] [CrossRef]
Mariani, L.; Pezzè, M.; Santoro, M. GK-Tail+ An Efficient Approach to Learn Software Models. IEEE Trans. Softw. Eng. 2017, 43, 715–738. [Google Scholar] [CrossRef]
Narayan, J.; Shukla, S.K.; Clancy, T.C. A Survey of Automatic Protocol Reverse Engineering Tools. ACM Comput. Surv. 2015, 48, 1–26. [Google Scholar] [CrossRef]
Sija, B.D.; Goo, Y.H.; Shim, K.S.; Hasanova, H.; Kim, M.S. A Survey of Automatic Protocol Reverse Engineering Approaches, Methods, and Tools on the Inputs and Outputs View. Secur. Commun. Networks 2018, 2018, e8370341. [Google Scholar] [CrossRef]
Antunes, J.; Neves, N.; Verissimo, P. Reverse Engineering of Protocols from Network Traces. In Proceedings of the 2011 18th Working Conference on Reverse Engineering, Limerick, Ireland, 17–20 October 2011; pp. 169–178. [Google Scholar] [CrossRef]
Shu, G.; Lee, D. A Formal Methodology for Network Protocol Fingerprinting. IEEE Trans. Parallel Distrib. Syst. 2011, 22, 1813–1825. [Google Scholar] [CrossRef]
Zhang, Z.; Wen, Q.Y.; Tang, W. Mining Protocol State Machines by Interactive Grammar Inference. In Proceedings of the 2012 Third International Conference on Digital Manufacturing & Automation, Guilin, China, 31 July–2 August 2012; pp. 524–527. [Google Scholar] [CrossRef]
Verwer, S.; Hammerschmidt, C.A. flexfringe: A Passive Automaton Learning Package. In Proceedings of the 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), Shanghai, China, 17–22 September 2017; pp. 638–642. [Google Scholar] [CrossRef]
Tsai, S.C.; Chang, C.P.; King, C.T. Reverse Engineering of Dynamic Parallel Program Behavior from Execution Traces. In Proceedings of the 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS), Wuhan, China, 13–16 December 2016; pp. 1075–1082. [Google Scholar] [CrossRef]
Iegorov, O.; Fischmeister, S. Mining Task Precedence Graphs from Real-Time Embedded System Traces. In Proceedings of the 2018 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), Porto, Portugal, 11–13 April 2018; pp. 251–260. [Google Scholar] [CrossRef]
Jeppu, N.Y.; Melham, T.; Kroening, D.; O’Leary, J. Learning Concise Models from Long Execution Traces. arXiv 2020. [Google Scholar] [CrossRef]
Li, W.; Wasson, Z.; Seshia, S.A. Reverse Engineering Circuits Using Behavioral Pattern Mining. In Proceedings of the 2012 IEEE International Symposium on Hardware-Oriented Security and Trust, San Francisco, CA, USA, 3–4 June 2012; pp. 83–88. [Google Scholar] [CrossRef]
Estrada-Vargas, A.P.; López-Mellado, E.; Lesage, J.J. A Black-box Identification Method for Automated Discrete Event Systems. IEEE Trans. Autom. Sci. Eng. 2015, 14, 1321–1336. [Google Scholar] [CrossRef]
Wu, X.; Dai, W. Data-Driven Behaviour Model Recovery Method for Finite-State Transition Model. In Proceedings of the IECON 2020 The 46th Annual Conference of the IEEE Industrial Electronics Society, Singapore, 18–21 October 2020; pp. 3811–3816. [Google Scholar] [CrossRef]
Chivilikhin, D.; Ulyantsev, V.; Shalyto, A.; Vyatkin, V. Function Block Finite-State Model Identification Using SAT and CSP Solvers. IEEE Trans. Ind. Inform. 2019, 15, 4558–4568. [Google Scholar] [CrossRef]
Soos, M.; Nohl, K.; Castelluccia, C. Extending SAT Solvers to Cryptographic Problems. In Theory and Applications of Satisfiability Testing—SAT 2009; Kullmann, O., Ed.; Springer: Berlin/Heidelberg, Germany, 2009; Lecture Notes in Computer Science; pp. 244–257. [Google Scholar] [CrossRef]
Python-Minisat. Available online: https://github.com/pgdr/python-minisat/tree/master (accessed on 1 September 2023).
Alexey, I.; Antonio, M.; Joao, M. PySAT: A Python Toolkit for Prototyping with SAT Oracles. In Theory and Applications of Satisfiability Testing—SAT 2018; Beyersdorff, O., Wintersteiger, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; Lecture Notes in Computer Science; pp. 428–437. [Google Scholar] [CrossRef]
Cummings, C.E.; Chambers, H. Finite State Machine (FSM) Design & Synthesis Using SystemVerilog—Part I. Available online: http://www.sunburst-design.com/papers/CummingsSNUG2019SV_FSM1.pdf (accessed on 1 September 2023).
Open-Source Tool That Uses Simple Textual Descriptions to Draw Beautiful UML Diagrams. Available online: https://plantuml.com/ (accessed on 1 September 2023).
Chivilikhin, D.; Patil, S.; Chukharev, K.; Cordonnier, A.; Vyatkin, V. Automatic State Machine Reconstruction From Legacy Programmable Logic Controller Using Data Collection and SAT Solver. IEEE Trans. Ind. Inform. 2020, 16, 7821–7831. [Google Scholar] [CrossRef]

Figure 1. Reengineering process.

Figure 2. Finite state machine (FSM) with a series of transitions corresponding to a trace as a series of events in an observable system’s behavior.

Figure 3. FSM with three groups of identified equivalent states and corresponding coloring of equivalent states.

Figure 4. Minimal FSM.

Figure 5. Unified modeling language (UML) diagram.

Figure 6. Experimental framework structure.

Figure 7. Experiments with synthetic FSM.

Figure 8. Illustration of the experiment to reverse-engineer the TCS encoder. On the waveform, red and green signals are input A and B signals, blue is output signal of system and yellow is additional unused clk signal.

Table 1. The coloring table with variables A.

Outputs Y	State in the Trace $Q_{i}$	State of the Minimal FSM $Q_{j}^{*}$
Outputs Y	State in the Trace $Q_{i}$	$Q_{0}^{*}$	$Q_{1}^{*}$	$Q_{2}^{*}$
$Y_{1}$	$Q_{1}$	1	2	3
$Y_{2}$	$Q_{2}$	4	5	6
$Y_{3}$	$Q_{3}$	7	8	9
$Y_{4}$	$Q_{4}$	10	11	12
$Y_{5}$	$Q_{5}$	13	14	15
$Y_{6}$	$Q_{6}$	16	17	18
$Y_{7}$	$Q_{7}$	19	20	21
$Y_{8}$	$Q_{8}$	22	23	24

Table 2. The table of transitions with variables B.

Transitions		Transition Condition as Input Symbol from X
*Initial State $Q_{j^{'}}^{}$**	*Target State $Q_{j^{″}}^{}$**	$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$
$Q_{0}^{*}$	$Q_{0}^{*}$	25	26	27	28
$Q_{0}^{*}$	$Q_{1}^{*}$	29	30	31	32
$Q_{0}^{*}$	$Q_{2}^{*}$	33	34	35	36
$Q_{1}^{*}$	$Q_{0}^{*}$	37	38	39	40
$Q_{1}^{*}$	$Q_{1}^{*}$	41	42	43	44
$Q_{1}^{*}$	$Q_{2}^{*}$	45	46	47	48
$Q_{2}^{*}$	$Q_{0}^{*}$	49	50	51	52
$Q_{2}^{*}$	$Q_{1}^{*}$	53	54	55	56
$Q_{2}^{*}$	$Q_{2}^{*}$	57	58	59	60

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cherepkov, D.; Mamoutova, O.; Dojnikov, A.; Bolsunovskaya, M. Using SAT Solvers to Reverse-Engineer FSM Models of Digital Devices. Electronics 2023, 12, 4680. https://doi.org/10.3390/electronics12224680

AMA Style

Cherepkov D, Mamoutova O, Dojnikov A, Bolsunovskaya M. Using SAT Solvers to Reverse-Engineer FSM Models of Digital Devices. Electronics. 2023; 12(22):4680. https://doi.org/10.3390/electronics12224680

Chicago/Turabian Style

Cherepkov, Danil, Olga Mamoutova, Anton Dojnikov, and Marina Bolsunovskaya. 2023. "Using SAT Solvers to Reverse-Engineer FSM Models of Digital Devices" Electronics 12, no. 22: 4680. https://doi.org/10.3390/electronics12224680

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Using SAT Solvers to Reverse-Engineer FSM Models of Digital Devices

Abstract

1. Introduction

2. Relevant Research

3. FSM Identification Method

3.1. Trace Format

3.2. SAT Solver as a Tool for SAT Problems

3.3. Preparing Data for the SAT Format

3.4. Construction of the FSM Minimization Problem

3.5. SAT-Based Trace Processing

3.6. Specification Generation

4. Experimental Evaluation

4.1. Experiment Parameters

4.2. Accuracy Evaluation

4.3. Experiment Framework

4.4. Experimental Study

4.4.1. Synthetic FSM as a White Box

4.4.2. TCS Encoder as a Black Box

5. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI