State Merging and Splitting Strategies for Finite State Machines Implemented in FPGA

Klimowicz, Adam; Salauyou, Valery

doi:10.3390/app12168134

Open AccessArticle

State Merging and Splitting Strategies for Finite State Machines Implemented in FPGA

by

Adam Klimowicz

^*

and

Valery Salauyou

Faculty of Computer Science, Bialystok University of Technology, Wiejska 45A, 15-351 Bialystok, Poland

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(16), 8134; https://doi.org/10.3390/app12168134

Submission received: 18 July 2022 / Revised: 9 August 2022 / Accepted: 12 August 2022 / Published: 14 August 2022

(This article belongs to the Section Electrical, Electronics and Communications Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Different strategies for the combination of merging and splitting transformation procedures for incompletely specified finite state machines implemented on field-programmable logic devices are offered. In these methods, such optimization criteria as the speed of operation, power consumption and implementation cost are considered already in the early phase of finite state machine synthesis. The methods also take into account the technological features of programmable logic devices and the state assignment method. The transformation quality ratio is calculated on the base of estimations of consumed power, critical path delay and number of utilized logic cells. The user is also able to choose the order of merging and splitting procedures and the direction of the optimization by setting weights for each criterion. The methods of the estimation of optimization criteria values are described, and the experimental results are also discussed.

Keywords:

logic synthesis; field-programmable gate arrays; finite state machines; logic optimization; state splitting; state merging

1. Introduction

A digital system can be described as a collection of finite state machines (FSM) and combinational circuits. FSMs are often used as independent modules as control devices. Usually, the original machines must be created by the engineer any time when he wants to create a new project. The success of the entire project is largely determined by the parameters of the FSMs built in the digital circuit. Therefore, the questions of finding the optimal representation of a finite state machine are always topical.

Currently, field-programmable gate arrays (FPGA) are commonly applied for building digital systems. A substantial number of optimization algorithms for finite state machines are focused on their implementation in FPGA. The criteria for optimizing FSMs are generally the area (cost of implementation), speed (critical delay path) and power consumption (dissipation). The area criterion is not a critical restriction because new FPGAs have a great number of logical elements built from look-up tables (LUT). Recently, the most significant optimization criteria are critical path delay and energy consumption.

The traditional process of the synthesis of finite state machines contains the following phases, which are sequentially executed: a minimization of the number of states (state merging), an encoding of states (state assignment) and a synthesis of the combinational part of finite state machine. Nevertheless, traditional methods frequently contradict the FSM optimization target at the stage of logic synthesis because all of the above-mentioned design phases totally ignore the characteristics of the technological base and the constraints of the logic synthesis process.

A classic attempt to solution of the problem of machine state merging relies on a creation of sets of matching states and searching for a minimal closed cover, which is an NP-complete problem [1]. Ref. [2] describes an exact minimization method based on the mapping of incompletely specified FSMs to a FSMs tree. In [3], a branch-and-bound search algorithm for the determination of sets of matching states is presented. One of the most known solutions is the STAMINA program [4], which can work in heuristic and exact variants and applies explicit enumeration to the solution of the state minimization task. The other implementation of merging procedure is presented in Ref. [5], which describes a program for parallel state reduction and state encoding, where incompletely specified state codes can be built.

The value of splitting the states of FSMs in state encoding procedure was declared in [6] and soon after in [7], where the splitting operation was used to decrease the power dissipation and resource utilization of the designed FSM. Ref. [8] describes an application of state splitting for the simultaneous state minimization and the state assignment of the FSM.

Many authors have pondered the synthesis methods for high-speed FSMs implemented on programmable logic devices with a large variety of approaches. Ref. [9] considers the problem of state encoding and optimization of the combinational part upon the implementation of high-performance FSMs in complex programmable logic devices (CPLD). Ref. [10] presents a novel architecture that is particularly optimized for implementation of reconfigurable FSMs; this architecture is called the transition-based reconfigurable FSM (TR-FSM) and shows a significant reduction in area, speed, and power consumption in relation to FPGA architectures. In [11], the implementation of finite state machines in FPGA with the application of integral blocks of read-only memory (ROM) is described. The presented approach shows two pieces of FSMs structure with multiplexers on inputs of ROM blocks, which allow decreasing the area and increasing the FSM speed. Ref. [12] presents BT-FSM, which is a finite state machine with a single bit input, where the state transition graph is in a form of a binary tree. The architecture of FSM is based on the previously developed model of the finite virtual state machine (FVSM) [13]. In [14], a modification of the feedback of asynchronous FSMs and convergent state encoding is proposed. In this approach, asynchronous FSMs can be realized as simply as synchronous ones. Ref. [15] presents the extended burst-mode architecture, based on local synchronization signals, which allows using approaches for synchronous machines, for the synthesis of asynchronous machines. The increase in speed of FSM can be achieved also by using a state splitting procedure. Ref. [16] presents the method based on the splitting of internal states, which makes it possible to decrease the ranks of transition functions and decrease the number of logic levels of transition functions.

Many approaches to the power consumption reduction of state machines have been recently proposed. They are mostly based on special state encoding procedures, decomposition, device clocking control and others. In Ref. [17] a genetic fuzzy c-mean c1ustering-based decomposition method, named GFCM-D, is offered for FSM partition into a set of c-fuzzy clusters. For reaching low power consumption, the target function of GFCM-D is minimized with the application of a genetic algorithm. A partitioning is widely used for FSM power minimization because most of time, only one of sub-FSMs should be clocked; in consequence, the energy is saved. Ref. [18] proposes a multi-population evolution strategy, denoted as MPES to accomplish the task of searching for a low power state encoding in FSM synthesis. MPES resolves this problem by using inner and outer evolution strategies (ES). In the inner strategy, subpopulations evolve independently and are responsible for local search in separate regions, while the outer strategy plays the role of a shell to optimize the subpopulations of inner-ES for improved solutions. Ref. [19] proposes a fast algorithm based on state transitions probability and simple control logic to realize the partitioned machines. An effective method for decreased dynamic power by reducing the switching activity is clock gating. Ref. [20] presents a consolidated and close-grained architecture-level clock gating mechanism for low-power hardware accelerators which are automatically created by a high-level synthesis tool. Another method includes the conception of clock gating into both the state logic (DGS) and output logic (DGO) in FSM individually and can be applied in most cases in any FSM [21]. The gating control logic automatically extracts information from the FSM state description. The desired adjacency graph to reduce the power dissipation is used in the method from [22]. A low-power state-encoding technique with upper bound peak current constraints is proposed in [23]. Ref. [24] presents a synthesis methodology dedicated to low power implementations of combinatorial circuits in FPGA devices. In this method, Boolean functions are defined by BDD (binary decision diagram). A brand-new structure of the switch activity BDD is suggested, which uses a function decomposition to minimize the switching activity of the logic. The algorithm of state encoding based on a decomposition and probabilistic description of the FSM is proposed in [25]. In this method, a binary tree with nodes created by sharing a finite state machine is used.

In several papers, the area is reduced concurrently with the minimization of the power dissipation in the phase of state encoding. Most works [26,27,28,29] propose genetic algorithms for this purpose. A new methodology of logic decomposition with application of BDDs is offered in [30]. The core of the proposed algorithm is the multiple cutting of a BDD. Additionally, methods of searching for the finest technology mapping focused on the configurability of FPGA logic blocks are described. Refs. [31,32] propose a novel technology-dependent design method which produces FSMs with three levels of logic blocks and regular systems of connections between the logic levels. The algorithm is based on splitting the set of internal states into two subsets. Each subset relates to a unique fragment of an FSM. The offered algorithm is placed in the area of two-fold state assignment techniques. In [33], the method based on constructing a partition for the set of output variables is proposed. It minimizes the number of extra variables encoding the collections of output variables (COVs).

The analysis of the known works shows that there are no works when in the primary phase of the synthesis process; the occupied area, speed, and power consumption are simultaneously minimized with the concurrent merging and splitting of the internal states of the FSM. Methods claiming that several optimization criteria are considered at the same time and, in fact, are reduced to the traditional approach, at each stage of which several algorithms are proposed.

In this paper, three heuristic strategies for optimizing incompletely specified FSMs are proposed, which, at the stage of merging and splitting states, take into account the parameters of the technological base, the method of state encoding used in the synthesis, and to improve such parameters as the area, speed, and power consumption. The considered approach is focused on the implementation of finite automata on FPGAs based on look-up tables. In system-on-programmable-chip (SoC) devices, the programmable section has the FPGA architecture, so the proposed techniques can be also applied when implementing state machines on SoC devices.

2. Materials and Methods

2.1. Idea of the Method

The idea of the approach is to execute sequential operations of splitting or merging the states if it is possible. In this way, new equivalent FSMs are built, whose further implementation can give various outcomes in terms of area, speed, and power consumption.

In this article, three different strategies for combining and splitting states are shown:

Merge-then-split strategy (MS), where state merging is performed first and, when it is impossible, state splitting is performed.
Split-then-merge strategy (SM), where state splitting is performed first and, when it is impossible, state splitting is performed.
Combined strategy (COMB), where the decision as to whether to perform state merging or splitting is made at each subsequent finite automaton transformation operation.

There can be also different optimization criteria considered in the proposed strategies: power consumption, speed, area, and balanced optimizations. To calculate the evaluation parameters and then implement the FSM, a state assignment using the selected method should be performed. To determine which state to split or which pair of states to choose, trial merging and trial splitting operations are executed.

The implementation cost is not a critical boundary because contemporary FPGA devices have a large number of logical elements. For this reason, the area parameter is not considered in this work, but it was investigated in earlier works [34].

2.2. Estimation of Optimization Criteria

For the estimation of the optimization criteria, all states (for splitting) or couples of states (for merging) should be considered in sequence. For each state or couple of states, a trial splitting or merging is executed. Next, the internal states are encoded applying one of the common encoding techniques, and the set of logic functions relating to the combinational part of the FSM is constructed. After that, for the state to split or pair of states to merge, power consumption P_i, critical delay path S_i or transformation quality Q_i ratios are estimated.

2.2.1. Estimation of Power Consumption

In most cases, the power dissipation in the digital circuits is a combination of the two components: static power—associated with the staying in some state (e.g., high level on the outputs); and the dynamic power—associated with alternating the state of the device. The dynamic power dissipation of the digital system depends on the frequency of switching the output registers.

The static power (also called leakage power) is generally the result of the unwanted subthreshold current in the transistor channel when the transistor is turned off. It depends on the supply voltage, the switching threshold voltage, and the transistor size. All these parameters depend on the technological base, which is used for circuit implementation. Using equivalent transformations of the FSM and different state assignment methods at the stage of logic synthesis, we cannot change these parameters. To reduce the static power, the methods such as dynamic voltage and frequency scaling, multi-voltage threshold and power gating should be used additionally.

A finite state machine is a tuple F = {A, X, Y, φ, ψ, a_reset}. In this notation, A is an M-element set of internal states A = {a₁, …, a_M}, with one selected initial state (a_reset). The set X is an L-element set of input values X = {x₁, …, x_L}, and the set Y is an N-element set of output values Y = {y₁, …, y_N}. There are also two functions describing the behavior of the FSM depending on the input vector: transition function φ and output function ψ. Transition function

φ : A \times X \to A

defines the next FSM state, depending on the present FSM state and the input vector. Output function

ψ : A \times X \to Y

for Mealy FSM, or

ψ : A \to Y

for Moore FSM, defines the output vector for a current state and input vector (Mealy) or only for a current state (Moore).

Additionally, for real implementations of FSMs in digital systems, the tuple F also contains a set of codes C = {c₁, …, c_M} whose cardinality is equal to the cardinality of a set A because each code c_i relates to the state a_i. Each code can be saved as an R-bit vector, where

R \in 〈 ⌈ l o g_{2} M ⌉ 〉, M

. R is also the number of memory elements needed to store the code of a present FSM state. Moreover, all codes of states should be orthogonal i.e., there must be no pairs of two identical codes.

The method which was described in [35] can be used to compute the dynamic power consumption of an FSM. The method is based on the state assignment and the probability of a “1” (or “0”) appearing on the input. Then, the power dissipation of an FSM can be described by the formula

P_{t o t a l} = \sum_{r = 1}^{R} P_{r},

(1)

where P_total—entire dynamic power dissipation of FSM; and P_r—dynamic power dissipation of r-th flip-flop (a state code memory element). The power consumption of each flip-flop is determined by the expression.

P_{r} = \frac{1}{2} V_{d d}^{2} \times f \times C \times S A_{r},

(2)

where P_r—power dissipated by memory element r; V_dd—supply voltage; f—operating frequency; C—output capacitance of each flip-flop; and SA_r—switching activity of r-th flip-flop, r ∈ <1, R>.

Let c_i be a binary vector used as a code of state a_i. Assuming that the number of bits of code c_i is equal to R, let V^r(c_i) represent the value of r-th bit of code c_i of state a_i, r ∈ <1, R>.

Then, the switching activity SA_r of memory element r is described by the following formula:

S A_{r} = \sum_{i = 1}^{M} \sum_{j = 1}^{M} P (a_{i} \to a_{j}) \times (V^{r} (c_{i}) ⨁ V^{r} (c_{j})),

(3)

where P(a_i → a_j)—probability of transition from state a_i to state a_j (a_i, a_j ∈ A); and

⨁

—logic operator “exclusive or” (XOR). The probability of transition P(a_i → a_j) can be calculated using the following equation:

P (a_{i} \to a_{j}) = P (a_{i}) \times P (X (a_{i}, a_{j})),

(4)

where P(a_i)—probability that the a_i is the current state of the FSM; and P(X(a_i, a_j))—probability that the input vector is equal to X(a_i, a_j), which causes a transition from state a_i to state a_j.

Let V^b(X) represent the value of the b-th variable of input vector X. The probability P(X(a_i, a_j)) that input vector of the FSM is identical to X(a_i, a_j) is described by the equation

P (X (a_{i}, a_{j})) = \prod_{b = 1}^{L} P (V^{b} (X (a_{i}, a_{j})) = d),

(5)

where d ∈ {“1”, “0”, “–”}; and P(x_b = d)—the probability that input variable x_b from input vector X(a_i, a_j) is identical to d.

In our method, we assume that probabilities of both 0 and 1 on any FSM input are the same, thus P(x_b = 0) = P(x_b = 1) = ½ and P(x_b = “–”) = 1. Notice that we do not consider the correlations between the values on individual inputs.

Next, from the following system of equations, we can determine the probability P(a_i) that a current state of FSM is a_i, i = <1, M>:

P (a_{i}) = \sum_{j = 1}^{M} P (a_{j}) \times P (X (a_{j}, a_{i})), i = 〈 1, M 〉 .

(6)

When no transitions between states a_j and a_i exist, it can be assumed that P(X(a_j, a_i)) = 0. Consequently, when transitions from the state a_j to state a_i exist, the value P(X(a_j, a_i)) is defined as a sum of the probabilities for every input vector, which causes a transition from state a_j to state a_i.

The Formula (6) denotes the linear system of M equations in M variables P(a₁), …, P(a_M). The system is linearly dependent, and the number of its solutions is infinite. However, we can notice that the machine is always in one of its internal states, and Formula (7) is correct:

\sum_{i = 1}^{M} P (a_{i}) = 1 .

(7)

One of the equations in (6) should be substituted by Equation (7) to solve the system of Equation (6). The power estimation algorithm was fully described in [35].

2.2.2. Estimation of Critical Path

In general, the architecture of contemporary FPGAs can be characterized as a set of logic elements based on look-up tables (LUTs). The LUTs can implement any Boolean function, with a small number of input variables (usually 4–8), so they can be called function generators. When the number of arguments of logic functions is greater than the number of LUT inputs n, the logic function should be decomposed regarding the number of arguments [36]. The most common decomposition methods are linear (serial) and parallel.

The length of the critical path of combinational part of FSM defines the speed of work of an entire FSM. This parameter is equal to the number of logic elements participating in the critical path. The maximum number of arguments L_max of the logic functions implemented in the combinational part of the FSM can be determined after state assignment and creating transition functions. If the technological base of implementation is a FPGA device, the length of the critical path is defined only by parameter L_max. When the linear decomposition is applied, it can be formulated as

S_i = 1 + int((L_max − n)/(n − 1)).

(8)

When the parallel decomposition is used, it can be described as

S_i = int(log_n L_max).

(9)

The full critical path estimation process was fully presented in work [37].

2.2.3. Estimation of Transformation Quality Ratio

If we want to use different criteria to evaluate the quality of the merging or splitting the states, a weighted sum can be used, which is one of the well-known methods of discrete multicriteria optimization [38]. Due to its ease, this method is probably the most widespread solution. In this method, a scalar cost function is specified as an aggregation of costs with weights.

Let F = (F₁,…, F_d) be a d-dimensional function and let λ = (λ₁,…, λ_d) be a vector which fulfils the following conditions:

\forall j \in [1 \dots d], λ_{j} > 0,

(10)

\sum_{j = 1}^{d} λ_{j} = 1 .

(11)

The λ-aggregation of F is the following function:

F^{λ} = \sum_{j = 1}^{d} λ_{j} F_{j} .

(12)

Naturally, the elements of λ correspond to the relative significance (weight). In this case, we have two criteria: power P_i and speed S_i. The weights for them can be specified by the user, appropriately w_P (power) and w_S (speed). With respect to above consideration, the transformation quality (aggregation) function Q_i for single state (for splitting) or any pair of states (for merging) can be specified as follows:

Q_{i} = w_{P} {\hat{P}}_{i} + w_{S} {\hat{S}}_{i},

(13)

where

{\hat{P}}_{i}

and

{\hat{S}}_{i}

are normalized criteria parameters P_i and S_i. The normalization is performed to eliminate the influence of wide range of magnitudes for the considered parameters. The normalization can be described by the formula

\hat{K_{i}} = (K_{m a x} - K_{i}) / (K_{m a x} - K_{m i n}),

(14)

where K_i—one of considered criteria parameters (P_i or S_i), K_max = 2·K_i, K_min = 0. The assumed values of K_min and K_max ensure that the initial value of the transformation quality ratio will be equal to 0.5.

2.3. State Merging Procedure

The merging procedure is based on the algorithm for the minimization of the number of FSM states offered in [39]. The idea of this algorithm relies on the sequential merging of only two states. For this reason, the set G of all couples of internal states which satisfy the merging conditions is settled at each step. Next, for every couple in the set G, an experimental merging is performed. Then, the couple that has the highest chance for merging other pairs in the next step is selected for the final merging.

We can join two machine states a_s and a_t (replace by one state a_st) in case of their equivalency. It means that the FSM behavior stays the same without changes after merging. FSM work does not vary after merging states a_s and a_t if the conditions of transitions from the states a_s and a_t that go to separate states are orthogonal. If transitions from states a_s and a_t go to the one state, then the conditions of transitions should be equal. Additionally, the output vectors produced at these transitions should not be orthogonal. Please note that during merging procedure, the wait states can be created. The methods of choosing couples of states to merge and the merging algorithm are fully described in [39].

2.4. State Splitting Procedure

The procedure of splitting the internal states of the FSM is an equivalent transformation of the FSM that does not change its behavior, general structure, and type. Therefore, including splitting into the synthesis process while implementing the finite state machine in FPGA devices is useful and can be simply added to the procedure of system design.

The state splitting procedure may lead to a decrease in power dissipation of the FSM [7] and to a gain in its speed of operation [40]. Any splitting of states leads to a growth in the number of states and hence, may lead to an increase in the number of memory elements needed for FSM implementation (increasing the cost). For this reason, the state-splitting procedure, taking into account the cost of realization of the FSM, is not considered in this paper.

2.4.1. State Splitting Procedure for Power Minimization

Using the classic state encoding methods, exactly one orthogonal code is assigned to each internal state. This implies applying codes with a Hamming distance greater than one. It may lead to an increase in switching activity of the memory elements used for saving the codes of FSM states. Of course, it is difficult or even impossible to guarantee a Hamming distance equal to 1 for all codes. The splitting of the internal states is one of the solutions to this problem. This operation gives more chances to find the couple of codes with the Hamming distance equal to one. Therefore, this should lead also to the decrease in power dissipation in the FSM [7].

Let X_P(a_i) = {z ∈ Z: φ(a_j, z) = a_i, a_i ∈ A, a_j ∈ A} be the set of all input vectors, which cause the transitions to the state a_i. Let X_F(a_i) = {z ∈ Z: φ(a_i, z) = a_k, a_i ∈ A, a_k ∈ A} be the set of all input vectors, which trigger the transition from the state a_i.

For any state a_i, card(X_P(a_i)) > 1 can be split into two new states

a_{i}^{(1)}

and

a_{i}^{(2)}

. After this procedure, the state a_i is substituted with states

a_{i}^{(1)}

and

a_{i}^{(2)}

such that we have the following:

Sets X_F for the new states are the same as the set for source state:

$X_{F} (a_{i}^{(1)}) = X_{F} (a_{i}^{(2)}) = X_{F} (a_{i}),$

(15)

Set X_P of the source state a_i is split into two individual components:

$X_{P} (a_{i}^{(1)}) \cup X_{P} (a_{i}^{(2)}) = X_{P} (a_{i}), X_{P} (a_{i}^{(1)}) \cap X_{P} (a_{i}^{(2)}) = \emptyset .$

(16)

The procedure of splitting the internal states of the FSM is reversible, hence the machine can return to its previous form by the merging of the states

a_{i}^{(1)}

and

a_{i}^{(2)}

into one state a_i. After splitting, the number internal states are greater for the final FSM, but the average number of the input vectors that cause the transitions to the state is lower. Additionally, it is more feasible to assign the codes with a smaller value of the Hamming distance, which causes the lower power consumption in the synthesized FSM.

2.4.2. State Splitting Procedure for Critical Path Minimization

The state splitting procedure for speed maximization comes from Ref. [40] but is adapted to use both binary and one-hot types of encoding. Just like in the work [40], the key strategy relies on searching for the set D of all states fulfilling the conditions for splitting:

\exists a_{i} \in A, card (B (a_{i})) > 1,

(17)

\exists a_{j} \in B (a_{i}), r_{j} \leq r^{*},

(18)

where r_j is the number of arguments of the function that initiates the transition to state a_i, r* is the upper limit of the number of arguments for all transition functions, A is a set of internal states, and B(a_i) is a set of states with transitions to state a_i.

If the conditions are satisfied, for each state from the set D the trial splitting is made. Each state a_i ∈ D is split into two new states. The first state is related to transitions from state a_j ∈ B(a_i), where r_j = max. The second state is related to the remaining transitions to state a_i. Finally, state a_i is selected for real splitting, which best fits the optimization criteria in regard to the FSM operation speed (minimization of critical path length S_i).

2.5. General FSM Synthesis Method

The general synthesis method uses two equivalent transformations of FSMs: a splitting and a merging. For this purpose, two sets are created: D—a set of states that can be split; and G—a set of state pairs that can be joined. Next, for each equivalent machine, the power consumption P_i, the maximum critical delay path (speed) S_i, and the cost of implementation (area) C_i, are calculated. The area parameter is not considered in this paper, as it was mentioned before. From the obtained results, a state (for splitting) or a couple of states (for merging) is selected for which the considered parameter is lowest (in case of speed or power) or highest (when using the balanced method) after the modification of the FSM.

In the merge-then-split (MS) strategy, there is always a merging that is performed first and after all possible merges, the splitting of states should be done. This strategy for the speed minimization is described using Algorithm 1. If we want to consider another criterion of optimization (e.g., power), we should replace the S_i parameter with the P_i parameter. In the case of using a balanced variant, we should use the transformation quality ratio Q_i and replace all “lower than” operators with “greater than” operators in Algorithm 1.

At the start of Algorithm 1, an initial FSM form is saved as the best one (line 1). Next, the subroutine for seeking couples for merging (building the set G) is executed (line 3). If there is no possibility to merge the states, the algorithm moves to the splitting phase, otherwise the trial merging is performed in the following way: first, the present FSM is saved, merging is executed, then the states are encoded, and the critical path ratio for current FSM is determined (lines 6–15). Among all solutions, the one is selected for which the critical path ratio S_i is minimal. After that, the real merging is performed, and a selection of states for the next merging is executed once more (lines 16–21).

Algorithm 1. General algorithm for FSM synthesis (power-aware merge-then-split strategy).

1: best_FSM ← FSM, last_FSM ← FSM
2: S_i ← MAX_P_VALUE
3: G ← FindMergePairs(FSM)
4: WHILE G ≠ ∅ DO
5:   S_m ← MAX_S_VALUE
6:   WHILE G ≠ ∅ DO
7: Save(FSM)
8: FSM ← Merge(FSM, (a_s, a_t) ∈ G)
9: Encode(FSM)
10:   IF CriticalPath(FSM) < S_m THEN
11: S_m ← CriticalPath(FSM)
12: Selected_Pair ← (a_s, a_t)
13:   END IF
14:   Restore(FSM)
15:     END WHILE
16:     FSM ← Merge(FSM, Selected_Pair)
17:     Last_FSM ← FSM
18:     IF S_m < CriticalPath(best_FSM) THEN
19:   best_FSM ← FSM, S_i ← S_m
20:     END IF
21:     G ← FindMergePairs(FSM)
22: END WHILE
23: D ← FindSplitStates(FSM)
24: WHILE D ≠ ∅ DO
25:     WHILE D ≠ ∅ DO
26:   Save(FSM)
27:   Encode(FSM)
28:   FSM ← Split(FSM, a_i∈ D)
29:   IF CriticalPath(FSM) < S_i THEN
30:    S_s ← CriticalPath(FSM)
31:    Selected_State ← (a_i)
32:   END IF
33:   Restore(FSM)
34:     END WHILE
35:     IF S_s < CriticalPath(Last_FSM) THEN
36:   FSM ← Split(FSM, Selected_State)
37:   Last_FSM ← FSM, No_Split ← FALSE
38:     ELSE
39:   No_Split ← TRUE
40:     END IF
41:     IF P_s < CriticalPath(best_FSM) THEN
42:   best_FSM ← FSM, S_i ← S_s
43:     END IF
44:     IF No_Split = FALSE THEN
45:   D ← FindSplitStates(FSM)
46:     ELSE
47:   D ← ∅
48:     END IF
49: END WHILE
50: END

After the merging phase, the subroutine for seeking states for splitting (building the set D) is performed (line 23). If there is no possibility to split any states, the algorithm stops, otherwise, the trial splitting is executed as follows. First, the present FSM is saved, splitting is executed, then the states are encoded, and the critical path ratio of FSM is determined (lines 25–34). Among all solutions, the one is selected for which the critical path ratio S_i is minimal. Then the real splitting is performed, and a selection of states for the next splitting is executed once more (lines 35–48). The final FSM form is the one with the lowest critical path ratio from all considered equivalent forms during the work of the algorithm.

The splitting process may be divergent, and therefore the stop condition for splitting is included in the algorithm. It is made in lines 35–40 of Algorithm 1, where the critical path ratio S_s of the splitting FSM at this time is compared to the identical value determined for the last completed splitting. If the splitting does not lead to a further decrease in the critical path, it should not be executed.

In the split-then-merge (SM) strategy, there is always a splitting performed first, and after all possible splits, the merging of states should be done. The algorithm for this strategy can be obtained from Algorithm 1. The only operation which should be performed is to replace lines 3–22 with lines 23–49 in Algorithm 1.

In the combined strategy (COMB), at each step, the trial merging and trial splitting is performed. Then, the decision of which transformation (splitting or merging) finally should be performed (depending on selected criteria) is made. The combined strategy with the consideration of the balanced variant of optimization is described in the form of Algorithm 2.

Algorithm 2. General algorithm for FSM synthesis (balanced combined strategy).

1: best_FSM ← FSM, last_FSM ← FSM
2: G ← FindMergePairs(FSM)
3: D ← FindSplitStates(FSM)
4: Q_i ← 0.5
5: WHILE G ≠ ∅ and D ≠ ∅ DO
6:   Q_m ← 0
7:   WHILE G ≠ ∅ DO
8: Save(FSM)
9: FSM ← Merge(FSM, (a_s, a_t) ∈ G)
10:   Encode(FSM)
11:   IF TransformationRatio(FSM) > Q_m THEN
12:    Q_m ← TransformationRatio(FSM)
13:    Selected_Pair ← (a_s, a_t)
14:   END IF
15:   Restore(FSM)
16:     END WHILE
17:     Q_s ← 0
18:     WHILE D ≠ ∅ DO
19:   Save(FSM)
20:   Encode(FSM)
21:   FSM ← Split(FSM, a_i∈ D)
22:   IF TransformationRatio(FSM) > Q_i THEN
23:    Q_s ← TransformationRatio(FSM)
24:    Selected_State ← (a_i)
25:   END IF
26:   Restore(FSM)
27:     END WHILE
28:     IF Q_m > Q_s THEN
29:   FSM ← Merge(FSM, SelectedPair)
30:   Q_i ← Q_m
31:     ELSE
32:   IF Q_s > TransformationRatio(Last_FSM) THEN
33:    FSM ← Split(FSM, Selected_State)
34:    Q_i ← Q_m
35:    Last_FSM ← FSM, No_Split ← FALSE
36:   ELSE
37:    No_Split ← TRUE
38:   END IF
39:     END IF
40:     IF Q_i > TransformationRatio(best_FSM) THEN
41:   best_FSM ← FSM
42:     END IF
43:     G ← FindMergePairs(FSM)
44:     IF No_Split = FALSE THEN
45:   D ← FindSplitStates(FSM)
46:     ELSE
47:   D ← ∅
48:     END IF
49: END WHILE
50: END

At the start of Algorithm 2, an initial FSM form is stored as the best one (line 1). Next, the subroutines for seeking couples for merging (building the set G) and states for splitting (building the set D) are performed (lines 2–3). If merging or splitting the states are impossible, the algorithm goes to the end, otherwise, the trial merging and splitting procedures are executed as follows. At the beginning, the present FSM is stored and next, merging or splitting procedures are executed, then the states are encoded, and the transformation ratio of FSM is determined (lines 7–27). Among all solutions, the one is selected for which the transformation ratio Q_i = max(Q_s, Q_m) is maximal, where Q_s and Q_m are the transformation quality ratios for splitting and merging, respectively. Finally, the real merging or splitting procedure is performed, and the subroutine for the selection of states for the next merging or splitting is executed again (lines 40–47). The final FSM form is the one with the highest transformation quality ratio from all considered equivalent forms (lines 40–41).

The splitting process may be divergent, like in the previously mentioned strategies. For that reason, the stop condition for splitting should be included. It is made in lines 32–38 of Algorithm 2.

After execution of one of the variants of the general algorithm of synthesis, the minimization of the number of FSM transitions and minimization of the number of input variables should be also made, if necessary, as it was explained in [39].

3. Results

The proposed three strategies for synthesis of FSMs were implemented as a part of a system for the optimization of digital systems based on programmable logic devices. To estimate the efficiency of the proposed strategies, we used MCNC FSM benchmarks [41]. Four methods of state assignment were investigated: binary, one-hot, JEDI (default output dominant algorithm) [42] and power optimized sequential encoding [43]. For all three strategies (MS, SM and COMB), three different optimization criterions were used: power consumption, speed of operation and balanced variant with identical weights for power and speed parameters (50%). If we also consider four types of encodings, we have 36 different variants of synthesis method considered in the paper.

The example experimental results for binary encoding and power oriented optimization are presented in Table 1, where Name is a benchmark filename, C₀, S₀ and P₀ are, respectively, the number of used logic elements (cost), maximum critical path described by a number of logic levels (speed), and dissipated power in milliwatts of the initial FSM before synthesis; C₁, S₁ and P₁ are, respectively, the cost, speed and dissipated power after synthesis using the MS strategy; and C₂, S₂ and P₂ are, respectively, the cost, speed and dissipated power after synthesis using the SM strategy. Finally, C₃, S₃ and P₃ are the same parameters obtained using the COMB strategy. A power dissipation was evaluated using the following values: output capacitance C = 3 pF, frequency f = 5 MHz, supply voltage V_CC = 5 V, input probability P(x_i = 1) = 0.5. Values of #M_X and #S_X are the numbers of merges and splits performed during the procedure. Similar tables were made for other variants, but only the statistical parameters are presented in this section.

It can be seen in Table 1 that for power-oriented optimization in all investigated cases, we have lesser or equal power consumption for the transformed FSMs than for the initial FSM. It also can be noticed that the number of merges and splits depends on the used strategy. It has the lowest average values for the MS strategy and significantly higher values for the SM and COMB strategies.

To examine the efficiency of strategies with the application of different state assignments and optimization directions, the gain/loss ratios were calculated. The gain/loss ratio is a relation of value of the considered parameter for the initial FSM to value of the considered parameter for the transformed FSM. The minimum, average and maximum ratios for MS strategy are presented in Table 2. All values are the geometric mean of all ratios calculated for each benchmark.

For the MS strategy, the average results acquired using the presented method are in all cases better than the results obtained for the initial FSM regarding the parameters corresponding to the optimization direction (e.g., power to power direction) in all styles of encoding used. It can be also noticed that the encoding type, in many cases, has a major influence on the result obtained using a specific optimization variant. When the speed optimization is used, the estimated power consumption significantly increases in many cases. It confirms that these two directions contradict each other. Similar observations can be made for other two strategies, the results of which are shown in Table 3 (SM strategy) and Table 4 (COMB strategy).

The proposed strategies were also compared to methods described in our previous works, where only the state merging procedure was considered. The results for the merging strategy (M) are presented in Table 5. Additionally, besides the three described optimization directions (power, speed and balanced), the state minimization direction [39] was examined. As it can be noticed, adding the state splitting procedure to the method in most cases increases the gain ratios for all examined parameters, i.e., power, speed, and area.

To compare three distinct aspects considered in experiments (strategy, optimization direction and encoding), the average values of all parameters were calculated in dependance on the different point of view. In Figure 1a, we can see that the MS strategy is best for speed and area optimization, and the SM strategy for power optimization. We can also see that all strategies produce better results than for the initial FSMs, not only in the cases of speed and power, but also for the area parameter.

Figure 1b shows a comparison of the results for different optimization directions. It confirms that the power direction gives better results in the power aspect, and speed direction, in the speed aspect. It can be also noticed that balanced optimization gives satisfactory results in all three aspects (power, speed, and area).

In the case of the M strategy (from previous works), the results in terms of speed and area were significantly worse than those obtained from the MS, SM and COMB strategies. Only the average power dissipation was similar to the value obtained using the COMB strategy (Figure 1a). The state minimization direction gives the best results for the area parameter and comparable results for power and speed parameters (Figure 1b).

Figure 2 shows a comparison of the results for different encoding methods. It shows that one-hot encoding style gives the best results in terms of speed, as predicted. The most suitable encoding methods for power optimization are the binary and JEDI methods. The sequential assignment, which was designed especially for power optimization, gives moderate results in terms of power consumption. Using all encoding types, the average results for the transformed FSM are always better than those of the initial FSM.

To check the effectiveness of the proposed strategies, the benchmarks converted by the proposed synthesis method were also synthesized and implemented using Intel Quartus Prime and Xilinx (AMD) Vivado tools. Four scenarios were chosen:

Initial FSM with default encoding (provided by Quartus Prime or Vivado);
Power direction with JEDI encoding;
Speed direction with one-hot and binary encoding;
Balanced variant with binary encoding.

All scenarios were performed for all three considered strategies (MS, SM and COMB). All benchmarks were synthesized using the same design flow parameters (balanced mode). The Quartus Prime and Vivado tools have also their own optimization procedures for performance, area, and power parameters, but they mostly operate in the phase of fitting and routing. The equivalent conversions of FSM do not consider this phase; they operate only in the pre-synthesis stage. For this reason, we decided to use default compiler parameters for all examinations. Three output values were carried out from report files for further analysis: total logic elements, maximum clock frequency and total power. The authors chose for the implementation the EP4CE115F29I8L FPGA device from the Cyclone IV E family (Intel) and XC7A35TSCG324-1 FPGA device from Artix-7 (Xilinx). The example results for MS strategy are shown in Table 6, where C₀, F₀ and P₀ are, respectively, the cost of implementation (number of used logic elements), maximum frequency (in MHz) and dissipated power (in milliwatts) of the initial FSM (without transformation); C₁, F₁ and P₁ are, respectively, the same parameters after power direction transformation with JEDI encoding; C₂, F₂ and P₂ are, respectively, identical values after using speed direction variant with one-hot encoding; and finally, C₃, F₃ and P₃ are, respectively, the same parameters after synthesis, using the balanced variant with binary encoding.

The comparison of average values of all investigated parameters for all scenarios and variants in the case of using the Quartus Prime tool for implementation is depicted in Figure 3. It can be noticed that the proposed strategies can be successfully used with the Quartus Prime tool. The worst results were obtained using the power optimization direction. Poor results in terms of power arise from the fact that the optimized power parameter (dynamic power) is significantly less than the static device power and has minimal influence on the total device power. Although all considered scenarios were optimized for speed or power, the most significant gain was noticed for the area parameter.

A similar comparison of average values of area, speed and power parameters for all scenarios and variants in the case of implementation using the Vivado tool is presented in Figure 4. It can be noticed that the proposed strategies can be successfully used in most cases also with the Vivado tool. The worst results were obtained using a balanced optimization direction in terms of area, but it was not the optimization goal. Similarly, as for the Quartus Prime tool, the lack of significant gain in terms of power is due to the fact that the optimized dynamic power is considerably less than the total device power.

4. Conclusions

The transformation of finite state machines is an important phase in the FSM synthesis process. Typically, state merging is used for the reduction of the number of memory elements, but state splitting can be useful for the minimization of the power dissipation and the critical path of the combinational part of FSM. In this paper, three different strategies for FSM transformations were presented. In addition, the offered approach allows to reduce not only the number of FSM states and optimize FSM key parameters, but also to minimize the number of FSM transitions and input variables due to the application of additional algorithms.

The merge-then-split strategy is the best solution for speed and area optimization, and the speed-then-merge strategy is better for power optimization. It can be noticed that all strategies produce better results than initial FSMs, not only in terms of speed and power, but also taking into consideration the cost of implementation.

The most important objective of this approach is not to find the FSM representation with a minimal number of states, but to find such a form of the FSM that has the optimal value of the considered parameter, e.g., speed, power consumption or transformation quality ratio (weights sum of the speed and power parameters). One of the most significant conclusions from the research is that the finite state machine with a minimum number of states is not, in many cases, the finest result in regard to the power consumption and speed.

The implemented method can be successfully used with commercial EDA tools, such as Intel Quartus Prime and Xilinx (AMD) Vivado. These tools have their own implementation optimization techniques, but they do not perform the equivalent transformations of FSMs; they only can change the style of the state assignment or minimize the logic functions (output and transition functions) at the stage of logic synthesis. Most of the optimization work in EDA industrial software is performed at post-synthesis stages, such as fitting and routing, so it will be also used for state machines transformed with the proposed method.

In the proposed approach, only the merging of a couple of states is considered, and states are split into only two states. The algorithm can be reworked to merge a larger group of states and to split a state to two or more states. Moreover, the additional perfection of the offered method can be achieved by taking into account incompletely specified values for the transition functions as supplementary conditions for the merging or splitting possibility of the FSM internal states.

The application of the automatic selection of strategy and weights for balanced variant is also considered in such a way as to find the FSM that is optimal in terms of all criteria taken into account. For that purpose, the using of some artificial intelligence methods, e.g., neural networks, could be useful.

Author Contributions

Conceptualization, A.K. and V.S.; methodology, A.K.; software, A.K.; validation, A.K. and V.S.; formal analysis, V.S.; investigation, A.K.; writing—original draft preparation, A.K.; supervision, V.S. All authors have read and agreed to the published version of the manuscript.

Funding

The work was supported by the WZ/WI-IIT/4/2020 grant from Bialystok University of Technology and funded with resources for research by the Ministry of Education and Science in Poland.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The address of benchmark data set used in this paper is as follows: https://ddd.fit.cvut.cz/www/prj/Benchmarks/MCNC.7z (accessed on 1 July 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

Pfleeger, C.F. State reduction in incompletely specified finite state machines. IEEE Trans. Comput. 1973, C-22, 1099–1102. [Google Scholar] [CrossRef]
Pena, J.M.; Oliveira, A.L. A new algorithm for exact reduction of incompletely specified finite state machines. IEEE Trans. Comput.-Aided Des. 1999, 18, 1619–1632. [Google Scholar] [CrossRef]
Gören, S.; Ferguson, F. On state reduction of incompletely specified finite state machines. Comput. Electr. Eng. 2007, 33, 58–69. [Google Scholar] [CrossRef]
Rho, J.-K.; Hachtel, G.; Somenzi, F.; Jacoby, R. Exact and heuristic algorithms for the minimization of incompletely specified state machines. IEEE Trans. Comput. Aided Des. 1994, 13, 167–177. [Google Scholar]
Avedillo, M.J.; Quintana, J.M.; Huertas, J.L. SMAS: A program for concurrent state reduction and state assignment of finite state machines. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), Singapore, 11–14 June 1991; pp. 1781–1784. [Google Scholar]
Yuan, L.; Qu, G.; Villa, T.; Sangiovanni-Vincentelli, A. An FSM reengineering approach to sequential circuit synthesis by state splitting. IEEE Trans. Comput. Aided Des. 2008, 27, 1159–1164. [Google Scholar] [CrossRef]
Grzes, T.N.; Solov’ev, V.V. Minimization of Power Consumption of Finite State Machines by Splitting Their Internal States. J. Comput. Syst. Sci. Int. 2015, 54, 367–374. [Google Scholar] [CrossRef]
Avedillo, M.J.; Quintana, J.M.; Huertas, J.L. State merging and state splitting via state assignment: A new FSM synthesis algorithm. IEE Proc. Comput. Digital Tech. 1994, 141, 229–237. [Google Scholar] [CrossRef]
Czerwinski, R.; Kania, D. Synthesis method of high speed finite state machines. Bull. Pol. Acad. Sci. Tech. Sci. 2010, 4, 635–644. [Google Scholar] [CrossRef]
Glaser, J.; Damm, M.; Haase, J.; Grimm, C. TR-FSM: Transition-based reconfigurable finite state machine. ACM Trans. Reconfig. Technol. Syst. (TRETS) 2011, 3, 23. [Google Scholar] [CrossRef]
Garcia-Vargas, I.; Senhadji-Navarro, R. Finite state machines with input multiplexing: A performance study. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 2015, 5, 867–871. [Google Scholar] [CrossRef]
Senhadji-Navarro, R.; Garcia-Vargas, I. High-performance architecture for binary-tree-based finite state machines. IEEE Trans. Comput. Aided Des. 2018, 37, 796–805. [Google Scholar] [CrossRef]
Senhadji Navarro, R.; García Vargas, I. Finite Virtual State Machines. IEICE Trans. Inf. Syst. 2012, E-95-D, 2544–2547. [Google Scholar] [CrossRef]
Pedroni, V.A. Introducing deglitched-feedback plus convergent encoding for straight hardware implementation of asynchronous finite state machines. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), Lisbon, Portugal, 24–27 May 2015; pp. 2345–2348. [Google Scholar]
De Faria Barbosa, F.T.; De Oliveira, D.L.; Curtinhas, T.S.; De Abreu Faria, L.; De Souza Luciano, J.F. Implementation of Locally-Clocked XBM State Machines on FPGAs Using Synchronous CAD Tools. IEEE Trans. Circuits Syst. I Regul. Pap. 2017, 64, 1064–1074. [Google Scholar] [CrossRef]
Solov’ev, V.V. Synthesis of Fast Finite State Machines on Programmable Logic Integrated Circuits by Splitting Internal States. J. Comput. Syst. Sci. Int. 2022, 61, 360–371. [Google Scholar] [CrossRef]
Tao, Y.; Wang, Q.; Zhang, Y. Genetic Fuzzy c-mean clustering-based decomposition for low power FSM synthesis. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC), San Sebastian, Spain, 5–8 June 2017; pp. 642–648. [Google Scholar]
Tao, Y.Y.; Zhang, L.J.; Wang, Q.Y.; Chen, R.; Zhang, Y.Z. A multi-population evolution strategy and its application in low area/power FSM synthesis. Nat. Comput. 2019, 18, 139–161. [Google Scholar] [CrossRef]
Li, S.; Choi, K. A high performance low power implementation scheme for FSM. In Proceedings of the International SoC Design Conference (ISOCC), Jeju, Korea, 3–6 November 2014; pp. 190–191. [Google Scholar]
Riahi Alam, M.; Salehi Nasab, M.E.; Fakhraie, S.M. Power Efficient High-Level Synthesis by Centralized and Fine-Grained Clock Gating. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2015, 34, 1954–1963. [Google Scholar] [CrossRef]
Nag, A.; Das, S.; Pradhan, S.N. Low-power FSM synthesis based on automated power and clock gating technique. J. Circuits Syst. Comput. 2019, 28, 1920003. [Google Scholar] [CrossRef]
Sait, S.M.; Oughali, F.C.; Arafeh, A.M. FSM State-Encoding for Area and Power Minimization Using Simulated Evolution Algorithm. J. Appl. Res. Technol. 2012, 10, 845–858. [Google Scholar] [CrossRef]
Wang, L.-Y.; Chu, Z.-F.; Xia, Y.-S. Low Power State Assignment Algorithm for FSMs Considering Peak Current Optimization. J. Comput. Sci. Technol. 2013, 28, 1054–1062. [Google Scholar] [CrossRef]
Kubica, M.; Opara, A.; Kania, D. Logic Synthesis Strategy Oriented to Low Power Optimization. Appl. Sci. 2021, 11, 8797. [Google Scholar] [CrossRef]
Kajstura, K.; Kania, D. Low Power Synthesis of Finite State Machines State Assignment Decomposition Algorithm. J. Circuits Syst. Comput. 2018, 27, 1850041. [Google Scholar] [CrossRef]
Xia, Y.; Almaini, A.E.A. Genetic algorithm based state assignment for power and area optimization. IEE Proc. Comput. Digit. Tech. 2002, 149, 128–133. [Google Scholar] [CrossRef]
Chaudhury, S.; KrishnaTejaSistla, K.T.; Chattopadhyay, S. Genetic algorithm based FSM synthesis with area-power trade-offs. Integr. VLSI J. 2009, 42, 376–384. [Google Scholar] [CrossRef]
Chattopadhyay, S.; Yadav, P.; Singh, R.K. Multiplexer targeted finite state machine encoding for area and power minimization. In Proceedings of the IEEE India Annual Conference, Kharagpur, India, 20–22 December 2004; pp. 12–16. [Google Scholar]
Aiman, M.; Sadiq, S.M.; Nawaz, K.F. Finite state machine state assignment for area and power minimization. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), Island of Kos, Greece, 21–24 May 2006; pp. 5303–5306. [Google Scholar]
Kubica, M.; Kania, D. Area-oriented technology mapping for LUT-based logic blocks. Int. J. Appl. Math. Comput. Sci. 2017, 27, 207–222. [Google Scholar] [CrossRef]
Barkalov, A.; Titarenko, L.; Mielcarek, K. Improving characteristic of LUT based Mealey FSMs. Int. J. Appl. Math. Comput. Sci. 2020, 30, 745–759. [Google Scholar]
Barkalov, A.; Titarenko, L.; Chmielewski, S. Improving Characteristics of LUT-Based Moore FSMs. IEEE Access 2020, 8, 155306–155318. [Google Scholar] [CrossRef]
Barkalov, A.; Titarenko, L.; Chmielewski, S. Mixed encoding of collections of output variables for LUT-based mealy FSMs. J. Circuits Syst. Comput. 2019, 28, 1950131. [Google Scholar] [CrossRef]
Klimowicz, A. Area Targeted Minimization Method of Finite State Machines for FPGA Devices. In Computer Information Systems and Industrial Management. CISIM 2018; Saeed, K., Homenda, W., Eds.; Lecture Notes in Computer Science 2018; Springer: Cham, Switzerland, 2018; Volume 11127, pp. 370–379. [Google Scholar]
Klimowicz, A.; Grzes, T. Combined State Merging and Splitting Procedure for Low Power Implementations of Finite State Machines. In Advances in Systems Engineering. ICSEng 2021; Borzemski, L., Selvaraj, H., Świątek, J., Eds.; Lecture Notes in Networks and Systems 2022; Springer: Cham, Switzerland, 2022; Volume 364, pp. 190–199. [Google Scholar]
Zakrevskij, A.D. Logic Synthesis of Cascade Circuits; Izdatel’stvo Nauka: Moscow, Russia, 1981. (In Russian) [Google Scholar]
Klimowicz, A. Combined State Splitting and Merging for Implementation of Fast Finite State Machines in FPGA. In Computer Information Systems and Industrial Management. CISIM 2020; Saeed, K., Dvorský, J., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2020; Volume 12133, pp. 65–76. [Google Scholar]
Zadeh, L.A. Optimality and non-scalar-valued performance criteria. IEEE Trans. Automat. Control 1963, AC-8, 59–60. [Google Scholar] [CrossRef]
Klimowicz, A.S.; Solov’ev, V.V. Minimization of incompletely specified mealy finite-state machines by merging two internal states. J. Comput. Syst. Sci. Int. 2013, 52, 400–409. [Google Scholar] [CrossRef]
Salauyou, V. Synthesis of High-Speed Finite State Machines in FPGAs by State Splitting. In Computer Information Systems and Industrial Management. CISIM 2016; Saeed, K., Homenda, W., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2016; Volume 9842, pp. 741–751. [Google Scholar]
Yang, S. Logic Synthesis and Optimization Benchmarks User Guide. Version 3.0; Technical Report; Microelectronics Center of North Carolina: Research Triangle Park, NC, USA, 1991. [Google Scholar]
Lin, B.; Newton, R.A. Synthesis of multiple level logic from symbolic high-level description languages. In Proceedings of the International Conference on VLSI, Cambridge, MA, USA, 2–4 October 1989; pp. 187–196. [Google Scholar]
Grzes, T.N.; Solov’ev, V.V. Sequential algorithm for low-power encoding internal states of finite state machines. J. Comput. Syst. Sci. Int. 2014, 53, 92–99. [Google Scholar] [CrossRef]

Figure 1. Comparison of average results: (a) for different strategies; (b) for different optimization directions.

Figure 2. Comparison of average results for different encoding styles.

Figure 3. Comparison of average results after implementation using Quartus Prime tool.

Figure 4. Comparison of average results after implementation using Vivado tool.

Table 1. The experimental results for binary encoding and power-oriented optimization.

	Initial FSM			MS Strategy					SM Strategy					COMB Strategy
Name	C₀	S₀	P₀	C₁	S₁	P₁	#M₁	#S₁	C₂	S₂	P₂	#M₂	#S₂	C₃	S₃	P₃	#M₃	#S₃
BBARA	6	3	62.14	5	2	61.71	3	1	5	2	61.71	4	1	5	2	61.71	4	2
BBSSE	11	3	226.01	11	3	162.49	3	1	11	3	162.49	4	1	11	3	226.01	3	1
BBTAS	5	2	134.51	5	2	112.50	0	2	5	2	105.16	2	2	5	2	112.50	1	2
BEECOUNT	7	2	113.28	7	2	91.89	2	1	7	2	91.89	4	2	7	2	91.89	2	1
CSE	11	3	58.06	12	3	55.22	0	3	12	3	55.22	0	3	12	3	55.22	0	3
DK14	8	2	239.67	8	2	239.67	0	2	8	2	228.18	2	2	8	2	225.34	2	3
DK16	8	2	401.14	8	2	391.58	0	4	8	2	387.33	4	4	8	2	385.38	3	3
EX1	24	4	204.39	24	4	191.12	0	1	24	4	178.23	1	1	24	4	178.23	2	2
EX4	13	2	165.18	13	2	163.22	0	2	13	2	163.22	2	2	13	2	165.18	3	3
EX6	11	2	274.01	11	2	274.01	0	1	11	2	274.01	1	1	11	2	274.01	1	1
LION9	5	2	192.57	3	1	84.38	5	2	3	1	84.38	8	3	4	2	77.26	4	1
PLANET	25	3	422.84	25	3	351.51	0	8	25	3	351.51	8	8	25	3	351.51	8	8
S1	11	4	329.30	11	4	286.21	0	4	11	4	286.21	4	4	11	4	261.20	11	11
S1488	25	4	72.68	25	4	71.62	0	3	25	4	71.62	3	3	25	4	71.28	5	6
S1494	25	4	73.04	25	4	71.72	0	4	25	4	71.72	4	4	25	4	71.72	3	4
S27	4	2	192.75	4	2	156.78	1	1	4	2	161.50	3	2	4	2	156.78	2	1
S386	11	3	170.33	11	3	169.10	0	1	11	3	169.10	1	1	11	3	170.33	1	1
S420	7	3	150.00	7	3	107.81	0	1	7	3	107.81	1	1	7	3	150.00	1	1
S510	13	3	240.57	13	3	240.57	0	1	13	3	237.03	1	1	13	3	233.49	1	3
S832	24	4	134.16	24	4	134.14	0	2	24	4	134.14	2	2	24	4	134.14	2	2
SAND	14	4	215.62	15	4	182.12	0	2	15	4	182.12	2	2	15	4	182.12	2	2
SSE	11	3	226.01	11	3	162.49	3	1	11	3	162.49	4	1	11	3	226.01	3	1
TBK	8	4	263.16	9	4	228.21	0	11	9	4	228.21	0	11	9	4	228.21	0	11
TRAIN11	5	2	101.90	3	1	93.75	7	1	3	1	83.33	11	4	3	1	93.75	8	1

Table 2. The gain/loss ratios of results for merge-then-speed strategy.

Parameter	Encoding	Power Direction	Speed Direction	Balanced
Power Min./Avg./Max.	Binary	1.00/1.15/2.28	0.65/1.02/2.05	1.00/1.15/2.28
	One-hot	1.00/1.01/1.14	0.99/1.00/1.07	1.00/1.02/1.14
	JEDI	1.00/1.13/1.58	0.38/0.92/1.21	0.69/1.08/1.45
	Sequential	1.00/1.07/1.47	0.90/1.00/1.18	0.83/1.04/1.24
Critical Path Min./Avg./Max.	Binary	1.00/1.05/2.00	1.00/1.08/2.00	1.00/1.08/2.00
	One-hot	1.00/1.09/2.00	1.00/1.08/2.00	1.00/1.08/2.00
	JEDI	1.00/1.00/1.00	1.00/1.08/2.00	1.00/1.08/2.00
	Sequential	0.67/0.98/1.00	1.00/1.08/2.00	1.00/1.08/2.00
Area Min./Avg./Max.	Binary	0.89/1.03/1.67	1.00/1.05/1.67	0.89/1.04/1.67
	One-hot	0.96/1.10/2.40	0.89/1.03/2.00	0.89/1.04/2.00
	JEDI	0.89/1.00/1.25	1.00/1.05/1.67	0.89/1.03/1.67
	Sequential	0.89/0.97/1.25	1.00/1.05/1.67	0.89/1.02/1.67

Table 3. The gain/loss ratios of results for speed-then-merge strategy.

Parameter	Encoding	Power Direction	Speed Direction	Balanced
Power Min./Avg./Max.	Binary	1.00/1.15/2.30	0.82/1.02/2.05	1.00/1.16/2.28
	One-hot	1.00/1.01/1.14	0.86/1.00/1.14	1.00/1.02/1.14
	JEDI	1.00/1.17/1.64	0.38/0.96/1.22	0.69/1.13/1.45
	Sequential	1.00/1.07/1.47	0.77/0.99/1.13	0.99/1.06/1.24
Critical Path Min./Avg./Max.	Binary	1.00/1.02/1.50	1.00/1.08/2.00	1.00/1.08/2.00
	One-hot	1.00/1.09/2.00	1.00/1.08/2.00	1.00/1.09/2.00
	JEDI	1.00/1.00/1.00	1.00/1.08/2.00	1.00/1.05/2.00
	Sequential	0.67/0.98/1.00	1.00/1.08/2.00	1.00/1.05/2.00
Area Min./Avg./Max.	Binary	0.89/1.01/1.25	1.00/1.05/1.67	0.89/1.04/1.67
	One-hot	0.96/1.10/2.40	0.89/1.04/2.00	0.96/1.09/2.00
	JEDI	0.89/0.99/1.25	1.00/1.05/1.67	0.89/1.00/1.67
	Sequential	0.89/0.97/1.25	1.00/1.05/1.67	0.89/1.00/1.67

Table 4. The gain/loss ratios of results for combined strategy.

Parameter	Encoding	Power Direction	Speed Direction	Balanced
Power Min./Avg./Max.	Binary	1.00/1.11/2.49	0.83/1.01/1.45	1.00/1.12/2.49
	One-hot	1.00/1.01/1.14	0.77/1.00/1.13	1.00/1.02/1.14
	JEDI	1.00/1.14/1.62	0.38/0.92/1.35	0.69/1.08/1.45
	Sequential	1.00/1.07/1.47	0.68/0.98/1.27	0.83/1.04/1.24
Critical Path Min./Avg./Max.	Binary	1.00/1.02/1.50	1.00/1.03/2.00	1.00/1.05/2.00
	One-hot	1.00/1.09/2.00	1.00/1.08/2.00	1.00/1.09/2.00
	JEDI	1.00/1.00/1.00	1.00/1.03/2.00	1.00/1.08/2.00
	Sequential	0.67/0.98/1.00	1.00/1.03/2.00	1.00/1.08/2.00
Area Min./Avg./Max.	Binary	0.89/1.01/1.25	1.00/1.03/1.67	0.89/1.03/1.67
	One-hot	0.96/1.10/2.40	0.89/1.05/2.00	0.96/1.08/2.00
	JEDI	0.89/1.00/1.25	1.00/1.03/1.67	0.89/1.03/1.67
	Sequential	0.89/0.98/1.25	1.00/1.03/1.67	0.89/1.02/1.67

Table 5. The gain/loss ratios of results for merging only strategy.

Parameter	Encoding	Power Direction	Speed Direction	Balanced	State min.
Power Min./Avg./Max.	Binary	1.00/1.07/2.28	1.00/1.05/2.05	1.00/1.06/2.28	1.00/1.06/2.05
	One-hot	1.00/1.01/1.14	0.99/1.00/1.07	1.00/1.01/1.14	0.92/1.01/1.14
	JEDI	1.00/1.07/1.58	0.99/1.05/1.26	0.91/1.05/1.26	0.99/1.05/1.26
	Sequential	1.00/1.01/1.13	0.90/1.00/1.13	0.83/1.00/1.13	0.90/1.00/1.13
Critical Path Min./Avg./Max.	Binary	1.00/1.05/2.00	1.00/1.08/2.00	1.00/1.08/2.00	1.00/1.08/2.00
	One-hot	1.00/1.09/2.00	1.00/1.09/2.00	1.00/1.09/2.00	1.00/1.09/2.00
	JEDI	1.00/1.02/2.00	1.00/1.08/2.00	1.00/1.08/2.00	1.00/1.08/2.00
	Sequential	1.00/1.02/1.50	1.00/1.08/2.00	1.00/1.08/2.00	1.00/1.08/2.00
Area Min./Avg./Max.	Binary	1.00/1.04/1.67	1.00/1.05/1.67	1.00/1.05/1.67	1.00/1.05/1.67
	One-hot	1.00/1.10/2.40	1.00/1.07/2.00	1.00/1.10/2.40	1.00/1.11/2.40
	JEDI	1.00/1.03/1.67	1.00/1.05/1.67	1.00/1.05/1.67	1.00/1.05/1.67
	Sequential	1.00/1.02/1.25	1.00/1.05/1.67	1.00/1.05/1.67	1.00/1.05/1.67

Table 6. The example results after implementation using Quartus Prime tool for MS strategy.

	Initial FSM Default Encoding			Power Direction JEDI Encoding			Speed Direction One-Hot Encoding			Balanced Binary Encoding
Name	C₀	F₀	P₀	C₁	F₁	P₁	C₂	F₂	P₂	C₃	F₃	P₃
BBARA	24	416.15	132.87	23	425.89	132.88	21	424.09	132.6	21	418.59	132.88
BBSSE	37	332.01	135.57	40	387.9	135.59	40	386.55	135.6	41	386.7	135.68
BBTAS	9	680.27	131.98	9	693	131.98	10	657.03	131.98	11	657.89	131.99
BEECOUNT	31	392.62	131.71	28	471.92	133.07	25	472.81	134.07	22	471.48	133.93
CSE	108	225.02	135.46	94	191.2	135.48	97	215.56	133.18	118	201.09	131.64
DK14	39	359.32	135.13	58	317.56	132.59	39	359.32	135.13	39	359.32	135.13
DK16	66	420.88	132.1	66	423.73	132.11	64	418.06	133.99	70	360.36	132.12
EX1	216	157.23	143.14	158	187.06	143.12	176	200.56	135.16	171	191.31	143.38
EX4	27	513.61	134.12	24	534.76	136.34	25	460.62	136.35	28	577.37	136.36
EX6	43	308.17	137.38	49	279.64	133.47	53	351.25	133.48	43	308.17	137.38
LION9	20	490.2	132.1	13	500	131.06	11	506.33	131.06	7	526.87	131.04
PLANET	128	432.53	146.81	131	423.91	147.4	125	454.34	138.19	145	335.68	147.66
S1	131	197.71	138.19	142	193.84	138.22	134	195.16	138.21	183	190.73	138.4
S1488	364	168.8	147.58	443	152.95	148.14	346	169.87	147.61	432	151.91	148.27
S1494	328	162.84	147.61	364	166.17	147.62	344	174.92	147.61	408	159.41	148.15
S27	46	358.29	135.83	45	347.58	131.84	46	321.23	135.82	45	316.56	131.83
S386	135	173.55	136.79	127	175.25	137.76	127	193.5	136.84	121	176.55	136.48
S420	71	648.09	142.7	79	515.46	142.72	71	641.85	140.28	77	650.62	142.71
S510	247	144.26	145.79	237	146.54	146.13	212	161.08	145.72	208	160.08	145.86
S832	221	164.1	140.45	218	178.57	140.7	198	189.86	140.75	183	199.48	140.8
SAND	37	332.01	135.57	40	387.9	135.59	40	386.55	135.6	41	386.7	135.68
SSE	194	182.58	139.6	217	169.09	139.65	206	174.28	139.59	207	171.03	139.57
TRAIN11	19	515.2	132.07	9	577.7	131.06	8	577.03	131.05	7	578.03	131.05

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Klimowicz, A.; Salauyou, V. State Merging and Splitting Strategies for Finite State Machines Implemented in FPGA. Appl. Sci. 2022, 12, 8134. https://doi.org/10.3390/app12168134

AMA Style

Klimowicz A, Salauyou V. State Merging and Splitting Strategies for Finite State Machines Implemented in FPGA. Applied Sciences. 2022; 12(16):8134. https://doi.org/10.3390/app12168134

Chicago/Turabian Style

Klimowicz, Adam, and Valery Salauyou. 2022. "State Merging and Splitting Strategies for Finite State Machines Implemented in FPGA" Applied Sciences 12, no. 16: 8134. https://doi.org/10.3390/app12168134

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

State Merging and Splitting Strategies for Finite State Machines Implemented in FPGA

Abstract

1. Introduction

2. Materials and Methods

2.1. Idea of the Method

2.2. Estimation of Optimization Criteria

2.2.1. Estimation of Power Consumption

2.2.2. Estimation of Critical Path

2.2.3. Estimation of Transformation Quality Ratio

2.3. State Merging Procedure

2.4. State Splitting Procedure

2.4.1. State Splitting Procedure for Power Minimization

2.4.2. State Splitting Procedure for Critical Path Minimization

2.5. General FSM Synthesis Method

3. Results

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI