Next Article in Journal
The Effect of Different Types of Feedback on Learning of Aerobic Gymnastics Elements
Previous Article in Journal
Rapid Temperature Control in Melt Extrusion Additive Manufacturing Using Induction Heated Lightweight Nozzle
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improving Hardware in LUT-Based Mealy FSMs

by
Alexander Barkalov
1,2,*,
Larysa Titarenko
1,3 and
Kazimierz Krzywicki
4,*
1
Institute of Metrology, Electronics and Computer Science, University of Zielona Gora, Ul. Licealna 9, 65-417 Zielona Gora, Poland
2
Department of Computer Science and Information Technology, Vasyl Stus’ Donetsk National University, 600-Richya Str. 21, 21021 Vinnytsia, Ukraine
3
Department of Infocommunication Engineering, Faculty of Infocommunications, Kharkiv National University of Radio Electronics, Nauky Avenue 14, 61166 Kharkiv, Ukraine
4
Department of Technology, The Jacob of Paradies University, Ul. Teatralna 25, 66-400 Gorzow Wielkopolski, Poland
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2022, 12(16), 8065; https://doi.org/10.3390/app12168065
Submission received: 18 July 2022 / Revised: 5 August 2022 / Accepted: 10 August 2022 / Published: 11 August 2022
(This article belongs to the Section Electrical, Electronics and Communications Engineering)

Abstract

:
The main contribution of this paper is a novel design method reducing the number of look-up table (LUT) elements in the circuits of three-block Mealy finite-state machines (FSMs). The proposed method is based on using codes of collections of outputs (COs) for representing both FSM state variables and outputs. The interstate transitions are represented by output collections generated during two adjacent cycles of FSM operation. To avoid doubling the number of variables encoding of COs, two registers are used. The first register keeps a code of CO produced in the current cycle of operation; the code of a CO produced in the previous cycle is kept in the second register. There is given a synthesis example with applying the proposed method. The results of the research are shown. The research is conducted using the CAD tool Vivado by Xilinx. The experiments prove that the proposed approach allows reducing the hardware compared with such known methods as auto and one-hot of Vivado, and JEDI. Additionally, the proposed approach gives better results than a method based on the simultaneous replacement of inputs and encoding of COs. Compared to circuits of the three-block FSMs, the LUT counts are reduced by an average of 7.21% without significant reduction in the performance. Our approach loses in terms of power consumption (on average 9.62%) and power–time products (on average 10.44%). The gain in LUT counts and area–time products increases with the increase in the numbers of FSM states and inputs.

1. Introduction

Nowadays, it is characteristic the fact that numerous digital systems are widely used in the daily life of human society [1,2]. Among other digital equipment, contemporary systems include a lot of various sequential devices [3]. The law of operation of a sequential device can be described by the model of the Mealy finite state machine (FSM) [4]. This model is used, for example, to set the behavior of (1) control devices [5,6]; (2) serial communication and display protocols [7]; (3) various software tools of embedded systems [8]; (4) control-dominated systems [9]; (5) different systems in robotics [10] and so on. This analysis led to the choice of the Mealy FSM model in our recent research.
The process of FSM-based design is connected with raising some optimization problems [5,7]. As a rule, the following characteristics of FSM circuits should be improved: the occupied chip area, the time of cycle (the maximum operating frequency) and the consumed power. The approaches used for reducing these values depend strongly on the peculiarities of logic elements used for implementing the FSM circuits. Changing the type of logic elements leads to the necessity for changing the optimization approaches. This is the reason for the continuous interest in developing new design methods aiming at the optimization of FSM circuits. These characteristics are interrelated. For example, the area reduction leads to reducing the power consumption [11,12]. Due to the great importance of the area reduction, we devote our article to this problem.
The area reduction of LUT-based FSM circuits may be achieved using the methods of structural decomposition (SD) [13]. In this case, an FSM circuit is represented as a composition of two to four large logical blocks. These blocks have unique systems of input variables and output functions distinguishing them from other circuit blocks [14]. In this article, we propose an alternative to the method discussed in [15]. The original method [15] is reduced to joint applying the replacement of inputs and encoding of collections of outputs (COs) [5]. Applying these methods is connected with generating two additional systems of functions. Implementing circuits for these additional systems requires using some chip resources. In this article, we propose to use the same variables both for encoding the COs and for the replacement of the FSM inputs. This leads to the elimination of a block generating the additional variables replacing the FSM inputs. As a result, this reduces the number of LUTs compared to this number for the equivalent circuit based on the approach of [15].
The main contribution of this paper is a novel design method which allows diminishing the LUT count (the number of LUTs) in the circuits of three-level Mealy FSMs with the joint use of two methods of structural decomposition. The proposed method is based on using the same additional variables as inputs of logic blocks generating both input memory functions (IMFs) and FSM outputs. Due to this, there is eliminated a block replacing FSM inputs inherent in the method [15]. The main purpose of the proposed method is to reduce the LUT count in the FSM circuit without significantly impairing the FSM performance.
The further text of the paper is organized in the following order. The second section contains basic information related to LUT-based Mealy FSMs. The third section discusses the necessary elements of the state of the art. In this section, we provide a critical analysis of existing synthesis methods and show the need for their improvement. The fourth section highlights the main idea of the proposed method. An example of synthesis is presented and analyzed in the fifth section. The sixth section includes the results of experiments and their analysis. A brief conclusion ends the paper.

2. Basics of LUT-Based Mealy FSM Design

In this section, we show the basics of designing FSMs circuits using internal resources of FPGAs. Here, we introduce the main notation used in the rest of the text and show the features of the logic elements used. At last, we introduce the simplest structural diagram of a Mealy FSM circuit implemented with LUTs and programmable flip-flops.
For a better understanding of the material of the article by readers, we introduce Table 1. This table shows the main sets of variables and the notation adopted in our article.
To start the designing process, it is necessary to set the law of the FSM behavior. For this, various mathematical apparatuses can be used [1]. Two methods are most commonly used for this purpose: (1) a state transition graph (STG) and (2) a state transition table (STT) [16]. We use both forms in this article. These forms are used to derive systems of Boolean functions (SBFs) defining dependencies between FSM outputs and input memory functions on the one hand, and FSM inputs and state variables on the other hand. These SBFs are used to design FSM logic circuits [16].
The FSM inputs create a set X = { x 1 , , x L } , the FSM outputs form a set Y = { y 1 , , y N } . The inputs determine transitions between FSM states combined into a set A = { a 1 , , a M } . To synthesize an FSM circuit, the states a m A are represented by binary codes K ( a m ) having R bits. The states are encoded using state variables from the set T = { T 1 , , T R } . The r-th bit of a state code is represented by an internal state variable T r T . The minimum value of R is calculated using the following formula:
R = l o g 2 M .
The Formula (1) determines so-called maximum state codes [17]. A special register R G is entered into the FSM circuit as a memory of the state codes [5]. In the case of FPGA-based FSMs, the R G is implemented using D flip-flops [17,18]. The content of R G is determined by the input memory functions combined into a set Φ = { D 1 , , D R } . The IMFs are inputs of R G showing a direction of a particular interstate transition.
The SBFs that make it possible to synthesize an FSM circuit can be formed using a direct structure table (DST) [5]. A DST is constructed on the base of either the initial STT or STG. An STT includes the following columns [16]: a current state a m (a state for the current instant); a state of transition a T (a state for the next instant); an input signal X h determining the transition from a m into a T (it is a certain conjunction of inputs); a collection of outputs (CO) Y h formed during the transition < a m , a T > ; and the numbers of transitions are shown in the column h. There are H lines in an STT. A DST includes all these columns and three additional columns. These additional columns are [5] the current state code K ( a m ) , the next state code K ( a T ) , and IMFs Φ h Φ , which allows loading the next state code into R G .
Using a DST, the following SBFs are constructed:
Φ = Φ ( T , X ) ;
Y = Y ( T , X ) .
The SBF (2) corresponds to the FSM transition function [5] that specifies the dependence of the states of transition on the current states and input variables. The SBF (2) represents rules of generating IMFs necessary to load a next state code into the R G . The SBF (3) corresponds to the FSM output function [5] that specifies the dependence of the FSM outputs on the current states and input variables. The SBF (3) represents rules of generating FSM outputs during each interstate transition. The SBFs (2) and (3) are the basis for the synthesis of FSM U 1 , whose structural diagram is shown in Figure 1.
In Figure 1, the B l o c k Y Φ implements the SBFs (2) and (3). The R G includes R flip-flops. The pulse Res loads the code of the initial state a 1 A into R G . Very often, there are only zeros in the code K ( a 1 ) [18]. The synchronization pulse Clk allows loading state codes into R G .
Consider a transition between the states a 3 and a 5 of some Mealy FSM. Let it be the transition with h = 6 . The transition is represented using fragments of three equivalent forms: an STG, an STT and a DST (Figure 2).
As follows from Figure 2a, the transition < a 3 , a 5 > is caused by the input signal X 6 = x 1 x 2 ¯ . The transition is accompanied by the producing of a CO Y 6 = { y 2 , y 4 } . Row 6 of the STT (Figure 2b) is a sequence of characters corresponding to the fragment of STG (Figure 2a). If, for example, there is M = 7 , then using (1) gives R = 3 and two sets: T = { T 1 , T 2 , T 3 } and Φ = { D 1 , D 2 , D 3 } . For a trivial state assignment [5], there are the codes K ( a 3 ) = 010 and K ( a 5 ) = 100 . These codes and IMF D 1 are written in the sixth line of DST (Figure 2c). This line determines a product term F 6 = T 1 ¯ T 2 T 3 ¯ x 1 x 2 ¯ . This term is a part of the sum of products (SOPs) of Boolean functions D 1 Φ and y 2 , y 4 Y . All other terms of SOPs for (2) and (3) are obtained in the same way [5].
In this paper, we discuss a case of implementing SBFs (2) and (3) using configurable logic blocks (CLBs) and other internal resources of FPGA chips [19]. To form an FSM circuit, the CLBs are connected using a programmable routing matrix [17,20]. In this paper, we consider CLBs, including LUTs, multiplexers and programmable flip-flops. Similar to the notation used in the paper [21], we use a symbol I–LUT to denote a single-output LUT having I inputs. Such a LUT can implement an arbitrary Boolean function having up to I arguments. The analysis of the FPGA market shows that AMD Xilinx dominates this market [19]. Due to it, we focus our current research on the solutions of Xilinx. These solutions are very popular at present for the implementation of various projects. This fact is confirmed by the analysis of the literature [22,23,24,25,26,27,28].
If the number of arguments of a Boolean function is greater than I, then the corresponding circuit can be implemented with the help of the functional decomposition (FD) [29,30,31,32]. In this case, the resulting circuits are, as a rule, multi-level. Additionally, they are characterized by very complex systems of “spaghetti-type” interconnections [13].
In LUT-based FSMs, the R G is hidden and distributed among CLBs generating IMFs. Due to it, the logic circuit of LUT-based FSM U 1 consists of two logic blocks (Figure 3).
In Mealy FSM U 1 , the block L S V consists of CLBs generating SBF (2). The state code is kept in the hidden register R G . Due to it, the pulses Res and Clk enter the block L S V . The outputs y n Y are generated by the block L Y . This block does not include flip-flops; it implements SBF (3).

3. Related Work

This section provides a brief analysis of basic methods used for reducing the number of LUTs in FSM circuits. We show that this problem can be solved using either a certain state assignment or various methods of functional and structural decomposition. We show the disadvantages inherent in the methods from these three groups. The method proposed in this paper belongs to the group of structural decomposition methods.
Under certain conditions, there is only one level of LUTs in the circuit of U 1 . To implement a single-level circuit, each function ϕ k Φ Y should depend on no more than I arguments. However, there are up to six address inputs in the present-day LUTs [19,33,34]. To balance the area-spatial-power characteristics of a LUT, it is necessary that the number of inputs does not exceed six [35]. Nevertheless, the total number of inputs and state variables of an FSM can significantly exceed the value of I. This leads to an imbalance between a very large number of FSM inputs, outputs and states, on the one hand, and a very small number of LUT inputs, on the other hand. To reduce the negative impact of this imbalance, it is necessary to improve the design methods of FPGA-based FSMs.
The required chip area can be reduced due to the optimizing of the system of interconnections for a particular circuit. Improving interconnections can reduce the power consumption because more than 70% of the power consumption is due to the interconnections [36]. Additionally, the interconnections are responsible for the value of maximum operating frequency of a resulting FSM circuit. As it is shown in [36], the complexity of the interconnection system is beginning to have an increasing negative impact on the propagation time of signals in the FSM circuits. As follows from [15], the regularization of interconnections results in reducing both the time and power consumption. To regularize the interconnection system, it is necessary to use the structural decomposition methods [13,37].
If the condition
N A ( ϕ k ) I
holds for each function ϕ k Φ Y , then there are L + R I-LUTs in a single-level circuit of the corresponding FSM U 1 . However, if the condition (4) does not hold for some functions ϕ k Φ Y , then it becomes impossible to represent such an FSM with a single level of LUTs. To improve the characteristics of multi-level circuits, various methods can be applied.
A significant number of optimization methods aimed at FPGA-based FSMs can be found in the literature [13,17,18,21,32,38,39,40,41]. As a rule, these methods can improve the value of one of the characteristics of the FSM circuit [39,40]. Additionally, there are methods which simultaneously reduce the values of two characteristics (area and power consumption, or area and performance). In our current paper, a method is proposed which aims at reducing the LUT count of three-block circuits of Mealy FSMs [15].
The values of N A ( ϕ k ) can be reduced with the help of a proper state assignment [41,42,43]. The number of FSM state memory elements is in the range from R = l o g 2 M to R = M . The upper limit of this amount ( R = M ) corresponds to a one-hot state assignment. Both of these extreme approaches can be found in many CAD tools, such as SIS [44], ABC [32,45] or Sinthagate [46]. The manufacturers of FPGA chips also have their tools for implementing the technology mapping of LUT-based circuits. Examples of such systems are Vivado [47], Vitis [48], and Quartus [49]. The first two CAD systems were developed by AMD Xilinx, and the third one is a product of Intel (Altera).
It is impossible to specify the approach that is optimal for any FSM. For example, in [50], there is given the comparison of the synthesis results for FSM circuits based on state codes with R = l o g 2 M and one-hot state codes. Note that both of these approaches are widely used in most modern CAD tools. As follows from the comparison, the one-hot codes are the best choice for FSMs with more than 16 states. However, in addition to the value of R, the number of input variables also has a very strong influence on the characteristics of LUT-based FSM circuits. For example, the experiments [51] definitely show the following: if the number of FSM inputs exceeds 10, then it is better to use the codes with a minimum number of bits.
As follows from this analysis, it is necessary to check which method leads to the best results for a specific combination of characteristics of a particular FSM. In this paper, we compared the results produced by our new approach with the characteristics of FSM circuits produced using the algorithm JEDI [44], and the methods auto ( R = l o g 2 M ) and one-hot ( R = M ) of Vivado [47] by Xilinx [19]. Our choice of JEDI is due to the fact that it is considered one of the best deterministic methods of the state encoding [44].
If condition (4) is violated, then various methods of functional decomposition should be applied to implement an FSM circuit [29,39]. All these methods are based on splitting the original SOP into sub-SOPs for which the number of arguments does not exceed the number of LUT inputs. Each sub-SOP corresponds to a partial function which differs from the initial function ϕ k Φ Y [39]. This splitting should be executed in a way that increases the number of logic levels of the final FSM circuit as little as possible [29]. Practically, the methods of FD are included in each academic and industrial CAD tool dealing with the LUT-based design. The main disadvantage of FD-based methods: they produce the FSM circuits with spaghetti-type interconnections [13]. It is known that such circuits lose in all three main characteristics to their counterparts with a regular interconnection system [52].
The methods of structural decomposition [13] are an alternative to the methods of FD. The main idea of these methods is the elimination of the direct connection between FSM inputs and state variables, on the one hand, and FSM outputs and IMFs, on the other hand. In the case of SD, an FSM circuit is represented as a composition of unique logic blocks. This leads to an increase in the number of implemented functions, but these partial functions are much simpler than functions (2) and (3). The analysis of these methods can be found, for example, in [13].
The first known methods of SD are the replacement of inputs (RI) and the encoding of the collections of outputs (ECO). They were proposed in the mid-twentieth century by M. Wilkes for the optimization of microprogram control units [53]. In [15], we proposed the joint use of these methods for the optimization of LUT-based Mealy FSMs’ circuits. Let us briefly describe these two methods.
In the case of R I , the set X = { x 1 , , x L } is replaced by a set of additional variables B = { b 1 , , b J } , where J L . The replaced inputs are represented by an SBF
B = B ( T , X ) .
Each function of (5) represents a multiplexor. Its control inputs are connected with the state variables, and the data inputs are connected with the replaced inputs. In the case of CLB-based solutions, these multiplexors are implemented using LUTs and dedicated multiplexors [54].
There are Q different COs. Each collection Y q Y includes FSM outputs generated during a particular interstate transition. As a rule, the condition Q < H holds, where H is a number of interstate transitions. The COs are encoded by binary codes K ( Y q ) . The bits of K ( Y q ) are represented by elements of an additional set Z = { z 1 , , z R Q } . The cardinality number of the set Z is determined as
R = l o g 2 Q .
To encode COs, two additional SBFs should be constructed:
Z = Z ( T , X ) ;
Y = Y ( Z ) .
The SBFs (7) and (8) are implemented using LUTs. Obviously, the system (8) is represented by R Q decoders.
Combining the methods of R I and E C O leads to the replacement of both SBFs (2) and (7). Now, the following SBFs should be constructed:
Φ = Φ ( T , B ) ;
Z = Z ( T , B ) .
The SBFs (5), (8)–(10) determine a structural diagram of FSM U 2 (Figure 4).
In FSM U 2 , a B l o c k B implements SBF (5). The variables b j B enter a B l o c k Z Φ implementing SBFs (9) and (10). The IMFs D r Φ enter the state code register R G . The variables z r Z are transformed into the FSM outputs y n Y by a B l o c k Y .
In LUT-based FSMs, these blocks are implemented using the internal resources of CLBs, inter-slice interconnections, programmable input–outputs and synchronization tree buffers [54]. In [15], we compared the characteristics of U 1 - and U 2 -based FSMs. The research results obtained in [15] show that the joint use of R I and E C O allows to significantly reduce the LUT counts in FSM circuits.
To optimize an FSM circuit, we propose using the variables z r Z for generating both FSM outputs and IMFs. To make it possible, we propose to use codes of COs generated in two neighboring instances of the FSM discrete time.

4. Main Idea of the Proposed Method

The analysis of FSM U 2 (Figure 4) allows finding its shortcomings. The main drawback of U 2 is the need to form two systems of additional variables. One of them serves to replace the inputs x l X , and the second system is used to encode the collections of outputs. These systems are represented by SBFs (5) and (10), respectively. To implement these systems, it is necessary to use some internal resources of FPGA chip. The amount of resources used can be reduced by using the same additional variables to implement both input memory functions and FSM outputs. In our article, there is proposed such an approach. Our analysis of the extensive literature shows that so far, there has been no such a method. Due to it, the proposed method has an undeniable scientific novelty.
Our method is based on using the codes of collections of FSM outputs for generating IMFs D r Φ . Consider Figure 5 where this idea is illustrated.
A subgraph of some STG is shown in Figure 5. The generator of pulses Clk sets the course of discrete time t ( t = 0 , 1 , 2 , ) . Three instances of time are shown in Figure 5. In the instant of time t, the FSM is in the state a ( t ) = a 4 . The transition from a 3 into a 4 is accompanied by producing a CO Y 5 . So, the following relation takes place: Y q ( t ) = Y 5 . From STG (Figure 5), we can find that a ( t + 1 ) = a 5 and Y q ( t + 1 ) = Y 3 . So, the transition < a 4 , a 5 > corresponds to a pair of COs < Y 5 , Y 3 > . This transition is caused by an input x 2 X . So, the pair < a 4 , x 2 > also corresponds to a pair of COs < Y 5 , Y 3 > . This means that IMFs can be represented using only codes of COs.
In FSM U 2 , the SOPs of functions D r Φ include product terms F h determined as
F h = A m B h .
In (11), the symbol A m stands for a conjunction of the state variables corresponding to the code of a current state a m written in the h-th row of DST; the symbol B h stands for a conjunction of additional variables replacing the input signal X h written in the h-th row of DST ( h { 1 , , H } ). If a pair < a m , X h > determines the h-th transition of an FSM, then we propose to replace it by a pair of COs (as it follows from Figure 5). So, we propose to construct the SOPs of functions D r Φ using product terms formed by conjunctions corresponding to codes of COs replacing a pair < a m , X h > .
To do it, we should use different sets of variables to encode COs Y ( t ) and Y ( t + 1 ) . For example, we use the elements of the set Z = { z 1 , , z R Q } to encode a CO Y ( t + 1 ) and the elements of a set V = { v 1 , , v R Q } to encode a CO Y ( t ) . Obviously, this actually doubles the number of variables encoding the collections of outputs compared to (6). To avoid doubling the resources used for the encoding, we propose using two interconnected registers for storing the codes of COs. This approach results in FSM U 3 (Figure 6).
In FSM U 3 , a block L Z implements SBF (7). There is a distributed register R Z inside of the block L Z . The register keeps the codes of COs Y ( t + 1 ) . This explains the presence of pulses Clk and Res entering L Z . The variables z r Z are inputs of both a block L Y and a register R V . The block L Y implements SBF (8). The register R V de facto transforms the variables z r Z into the variables v r V representing the codes of COs Y ( t ) . As follows from Figure 6, the same pulses Clk and Res are used by both registers. A block L T generates the state variables T r T represented by an SBF
T = T ( Z , V ) .
There are the following product terms in SOPs of the SBF (12):
E h = Z h V h ( h { 1 , , H Z V } ) .
In (13), the symbols Z h and V h stand for conjunctions of the variables z r Z and v r V , respectively. As we show a bit later, the following condition can take place: H H Z V .
In this paper, we propose a synthesis method for U 3 -based Mealy FSMs. We assume that the FSM to be synthesized is represented by its STG. The proposed method includes the following steps:
  • Constructing the STT corresponding to an initial STG.
  • Executing the state assignment using maximum binary codes K ( a m ) .
  • Encoding of collections of outputs Y q Y by binary codes K ( Y q ) .
  • Finding the SBF Y = Y ( Z ) .
  • Creating the modified DST of FSM U 1 .
  • Creating a table of pairs P g = < Y i , Y j > corresponding to pairs < a m , X h > .
  • Creating a table representing the block L Z and SBF Z = Z ( T , X ) .
  • Creating a table representing the block L T and SBF T = T ( Z , V ) .
  • Implementing the LUT-based circuit of Mealy FSM U 3 using internal resources of a particular FPGA chip.
Let us analyze the complexity of the proposed method. Because each FSM transition should be transformed into a pair of COs, the time of synthesis depends on the number of FSM transitions. The synthesis algorithm does not include iterations. The pairs of COs are formed strictly sequentially: at each moment of time, the next in line transition is transformed into a pair of COs. In this regard, the algorithm has a linear character.

5. Example of Synthesis

We use the symbol U i ( S a ) to show that the model U i ( i { 1 , 2 , 3 } ) of Mealy FSM is used to implement the circuit of an FSM S a . Let us consider an example of the synthesis of Mealy FSM U 3 ( S 1 ) shown in Figure 7. We use 4-LUTs to implement the circuit.
Using an STG, we can find the sets of states, inputs and outputs, as well as the number of interstate transitions. Using Figure 7, we can find the sets A = { a 1 , , a 5 } , X = { x 1 , x 2 , x 3 } and Y = { y 1 , , y 6 } . This gives the following values: M = 3 , L = 3 , and N = 6 . The analysis of Figure 7 shows that there are H = 9 transitions between the states of FSM S 1 . Naturally, the state a 1 A is the initial state.
Step 1. The transformation of an STG into an equivalent STT is executed in the trivial way [16]. As follows from Figure 2, each arc of the STG is transformed in a row of the corresponding STT. In our case, Table 2 is an STT of Mealy FSM S 1 corresponding to the STG shown in Figure 7.
In the column Y h of Table 2, we show the collections of outputs Y q Y . As a rule, such information is not given in the classical STT [5].
Step 2. For FSM S 1 , there is M = 5 . Using (1) gives R = 3 . This determines the set of state variables T = { T 1 , T 2 , T 3 } . It is possible to encode the states in a way optimizing the system (7). For example, this can be done using the algorithm JEDI [44]. In our simple example, we use the trivial way of state assignment [5] with the following state codes: K ( a 1 ) = 000 , K ( a 2 ) = 001 ,…, K ( a 5 ) = 100 .
Step 3. Using Table 2, we can find the following collections of outputs: Y 0 = Ø , Y 1 = { y 1 , y 2 } , Y 2 = { y 3 } , Y 3 = { y 2 , y 4 } , Y 4 = { y 3 , y 5 } , and Y 5 = { y 1 , y 3 , y 6 } . So, in our example, there is Q = 6 .
As shown in [13], it is necessary to encode the collections in a way that minimizes the number of literals in functions from (8). If the condition
R Q > I
holds, then such an approach could minimize the LUT count for the block L Y [13,15,37].
To encode the COs, we use the approach proposed in [55]. The outcome of encoding is shown in Figure 8.
Step 4. Using the distribution of FSM outputs by COs and codes (Figure 8), we obtain the following SBF:
y 1 = Y 1 Y 5 = z 1 z 2 ¯ ; y 2 = Y 1 Y 3 = z 1 z 3 ¯ ; y 3 = Y 2 Y 4 Y 5 = z 3 ; y 4 = Y 3 = z 1 z 2 ; y 5 = Y 4 = z 2 z 3 ; y 6 = Y 5 = z 1 z 3 ;
The analysis of (15) shows that there are 11 literals in this system. So, there are 11 interconnections between the blocks L Z and L Y . As shown in [13], in the common case, there are N R Q = 21 interconnections between these blocks. Therefore, using the approach [55] allows reducing the number of interconnections by 1.91 times.
Step 5. The columns of a DST are shown in Figure 2c. We have modified the traditional DST. The column Y h is replaced by a column Z h (Table 3).
Step 6. The first five steps of this example are performed using known techniques [13,15]. Starting from the sixth step, the features of our method appear. Since we propose to represent the terms of IMFs in the form of conjunctions corresponding to the codes of COs at adjacent operation cycles, it is necessary to find these pairs of COs. For these purposes, a table of pairs should be built.
A table of pairs P g = < Y i , Y j > shows a correspondence between these pairs and the pairs < a m , X h > . There are six columns in this table: a m (a current state); a T (a transition state); Y m (a CO produced during the transition into the state a m ); Y T (a CO produced during the interstate transition < a m , a T > ); P g (a pair < Y m , Y T > ); and g (the number of a table row, g { 1 , , G } ). The following condition holds:
G H .
For example, in the discussed case, there is G = 12 (Table 4).
Let us explain why the relation (16) takes place. For example, there is a single transition < a 3 , a 4 > in Table 2. This transition is accompanied by CO Y 5 . At the same time, there are two rows in Table 4 representing this transition. This is explained by the fact that two different COs are produced during the transitions into a 3 A . As follows from either the STG (Figure 7) or the STT (Table 2), the transition < a 1 , a 3 > is accompanied by the generating CO Y 2 (the row 2 of Table 2), and the transition < a 2 , a 3 > is accompanied by the generating CO Y 3 (the row 3 of Table 2). Due to it, the transition < a 3 , a 4 > is represented by the pairs P 6 = < Y 2 , Y 5 > and P 7 = < Y 3 , Y 5 > .
The similar analysis allows filling all rows of Table 4. Each transition from states a 1 , a 2 , a 5 A is represented by a single pair. However, two transitions from a 4 A are represented by four pairs P 8 P 11 (Table 4).
Step 7. The table of block L Z (Table 5) is created using the modified DST (Table 3). This table includes only a part of the DST columns: a m , K ( a m ) , X h ; Z h and h.
Obviously, SOPs of SBF (7) include the product terms F h = A m X h ( h { 1 , , H } ) . Using Table 5 gives the following minimized SOPs:
z 1 = ( F 1 F 3 ) F 6 = T 1 ¯ T 2 ¯ x 1 T 1 ¯ T 2 T 3 ¯ ; z 2 = F 3 F 5 F 7 ; z 3 = ( F 2 F 4 F 5 ) F 6 F 7 = T 1 ¯ T 2 ¯ x 1 ¯ T 2 T 3 ¯ T 2 T 3 x 3 .
Step 8. This step is presented only in our proposed method. As follows from (12), the state variables T r T depend on variables encoding COs. So, it is necessary to construct a table reflecting this dependence. To do it, each transition from the initial state transition table (Table 2) must be represented as a transition between the COs from the adjacent cycles of operation times. This dependence is shown in the table of L T .
The table of L T includes seven columns. They are the following: Y m , K ( Y m ) , Y T , K ( Y T ) , a T , T g , and g. This table is constructed using the columns Y m , Y T , a T of the table of pairs (Table 4), the codes of COs (Figure 8) and state codes K ( a T ) . In the discussed case, there are G = 12 rows in this table (Table 6).
For example, the following relations take places for the first row of Table 4: Y m = Y 0 , Y T = Y 1 and a T = a 2 . As follows from Figure 8, there are the codes K ( Y 0 ) = 000 and K ( Y 1 ) = 100 . As follows, for example, from Table 5, there is the state code K ( a 2 ) = 001 . So, the column T g of Table 6 contains the symbol T 3 for the row g = 1. All other rows are filled in the same manner.
Using the table of L T , the SBF (12) is derived. The SOPs of corresponding functions include the terms (13). In the discussed case, this is the following SBF:
T 1 = P 5 P 8 P 10 = v 1 v 2 ¯ v 3 ¯ z 1 ¯ z 2 v 1 ¯ v 2 ¯ v 3 z 1 ¯ z 2 v 1 v 3 z 1 ¯ z 2 ; T 2 = P 2 P 3 P 4 P 6 P 7 ; T 3 = P 4 P 6 P 7 .
Step 9. To obtain the LUT-based circuit of Mealy FSM U 3 ( S 1 ) , the step of technology mapping should be executed [31]. This can be done only with the help of some industrial CAD tools. In the case of Virtex-7-based circuits, the industrial package Vivado [47] should be used. This CAD tool executes the process of technology mapping. As a result, we can extract the real characteristics of an FSM circuit (such as the LUT count, number of slices, number of flip-flops, maximum operating frequency, and power consumption) from the Vivado reports.
This CAD tool can be used starting from the FPGAs of the Virtex-7 family. So, it is impossible to use Vivado for implementing the circuit of FSM U 3 ( S 1 ) using LUTs with four inputs. In the next section, there are shown the results of experiments conducted with the help of the industrial CAD package Vivado and the library of standard benchmark FSMs [56].

6. Experimental Results

In this section, there are shown the results of experiments which were conducted to compare the characteristics of U 3 -based Mealy FSMs with the characteristics of FSM circuits based on some other models. The benchmark FSMs from the library [56] are used for these experiments. This library includes 48 benchmarks represented in the format KISS2 taken from the practice of logic design. Although the library dates back to the 1990s of the twentieth century, it has been used by various authors for 30 years to compare the new and existing methods of implementing FSM circuits. Let us indicate only some examples of articles and monographs, where the library [56] is used in experimental research. Such works include, for example, articles [31,35,52,57,58,59] and monographs [6,39]. The basic characteristics of benchmarks are shown in Table 7.
To conduct the research, we use a personal computer with the following characteristics: CPU, Intel Core i7 6700 K [email protected] GHz and memory, 16 GB RAM 2400MHz CL15. As a platform for FSM circuits implementation, the Virtex-7 VC709 Evaluation Platform (xc7vx690tffg1761-2) [60] is used. As a CAD tool, we use the package Vivado v2019.1 (64-bit) of Xilinx [47]. The circuits are implemented using CLBs from the slices SLICEL. They include LUTs having six inputs. To create the tables with research results, the reports of Vivado are used. To link the initial KISS2-based files with Vivado, we create VHDL-based descriptions of these models. To do it, the CAD tool K2F [40] is used.
Using the Vivado reports, we compare some parameters of produced FSM circuits. These parameters are (1) the required chip area occupied by an FSM circuit (the LUT count represents this characteristic); (2) the maximum operating frequency achievable for a particular FSM; (3) the required number of flip-flops; (4) the power consumption; (5) the area-time products; and (6) the power–time products. In our experiments, we use the following FSM models: (1) auto of Vivado (it is based on the maximum binary state codes); (2) one-hot of Vivado (in this case, R = M ); (3) JEDI; (4) U 2 -based FSMs [15]; and (5) U 3 -based FSMs proposed in this article.
As it is in our previous research [15], the benchmark FSMs are divided by five groups. The groups are determined by the value of a parameter D ( R , L , I ) . This parameter is calculated as
D ( R , L , I ) = L + R I .
Using (19), we create the following groups. The relation D ( R , L , I ) 0 determines the group of trivial FSMs (the group G0). The relation 0 < D ( R , L , I ) 6 determines the group of simple FSMs (G1). The relation 6 < D ( R , L , I ) 12 determines the group of average FSMs (G2). The relation 12 < D ( R , L , I ) 18 determines the group of big FSMs (G3). The relation D ( R , L , I ) > 18 determines the group of very big FSMs (G4). As research [15] shows, the larger the group number, the greater the gain from the use of methods of structural decomposition.
The results of the experiments are shown in Table 8, Table 9, Table 10, Table 11, Table 12, Table 13, Table 14, Table 15, Table 16, Table 17, Table 18, Table 19 and Table 20. We have organized these tables in the following way. In the table columns, we show the names of the methods used. The table rows are marked by the names of benchmarks. At the intersection of a column with a method and a row with a benchmark, we show the result of a specific experiment obtained from the Vivado report. Inside each table, the benchmarks are listed in alphabetical order and sorted by ascending group number. The rows “Total” contain the results of summation of values for each column. The row “Percentage” includes the percentage of summarized characteristics of FSM circuits produced by other methods, respectively, to U 3 -based FSMs. We use the model of Mealy FSM U 1 for methods auto, one-hot, and JEDI.
These tables include the following information: (1) the LUT counts for all benchmarks (Table 8); (2) the LUT counts for benchmarks of the group G0 (Table 9); (3) the LUT counts for benchmarks of the group G1 (Table 10); (4) the LUT counts for benchmarks of groups G2, G3 and G4 (Table 11); (5) the maximum operating frequency for all benchmarks (Table 12); (6) the maximum operating frequency for benchmarks of the group G0 (Table 13); (7) the maximum operating frequency for benchmarks of the group G1 (Table 14); and (8) the maximum operating frequency for benchmarks of the groups G2–G4 (Table 15).
To fill in the tables with the research results, we use data from our previous articles. Basically, all numbers are taken from papers [13,61]. However, information about the number of flip-flops is mentioned only in paper [14]. Therefore, we used Ref. [14] to fill in Table 16. The necessary information regarding the proposed method is taken from the Vivado reports.
The following conclusions can be made from the analysis of Table 8, Table 9, Table 10, Table 11, Table 12, Table 13, Table 14 and Table 15.
Table 8. Experimental results (LUT counts for all benchmarks).
Table 8. Experimental results (LUT counts for all benchmarks).
BenchmarkAuto [13]One-Hot [13]JEDI [14] U 2 [61]Our ApproachGroup
bbtas555880
dk175125880
dk27354770
dk5121010912130
ex399911110
ex599910100
lion252660
lion96115880
mc474660
modulo12777990
shiftreg262440
bbara17171010111
bbsse33372426241
beecount19191414131
cse40663633321
dk1416271012111
dk15151612671
dk1615341211101
donfile31312421201
ex2998891
ex415131211101
ex624362221201
ex7454671
keyb43614037381
mark123232019181
opus28282221201
s276186671
s38626392225221
s89999101
sse33373026251
ex170745340342
kirkman42583933292
planet1311318878712
planet11311318878712
pma94948672682
s165996154502
s148812413110889862
s149412613211090802
s1a49814338342
s2081231109112
styr931208170622
tma45393930292
sand13213211499823
s42010319894
s51048483222204
s82088826852484
s83280796250464
Total18082104148913231234
Percentage,%146.52170.50120.66107.21100.00
Table 9. Experimental results (LUT counts for benchmarks of G0).
Table 9. Experimental results (LUT counts for benchmarks of G0).
BenchmarkAuto [13]One-Hot [13]JEDI [14] U 2 [61]Our Approach
bbtas55588
dk17512588
dk2735477
dk512101091213
ex39991111
ex59991010
lion25266
lion9611588
mc47466
modulo1277799
shiftreg26244
Total6286618990
Percentage,%68.8995.5667.7898.89100.00
Table 10. Experimental results (LUT counts for benchmarks of G1).
Table 10. Experimental results (LUT counts for benchmarks of G1).
BenchmarkAuto [13]One-Hot [13]JEDI [14] U 2 [61]Our Approach
bbara1717101011
bbsse3337242624
beecount1919141413
cse4066363332
dk141627101211
dk1515161267
dk161534121110
donfile3131242120
ex299889
ex41513121110
ex62436222120
ex745467
keyb4361403738
mark12323201918
opus2828222120
s27618667
s3862639222522
s8999910
sse3337302625
Total406525337322314
Percentage,%129.30167.20107.32102.55100.00
Table 11. Experimental results (LUT counts for benchmarks of G2–G4).
Table 11. Experimental results (LUT counts for benchmarks of G2–G4).
BenchmarkAuto [13]One-Hot [13]JEDI [14] U 2  [61]Our Approach
ex17074534034
kirkman4258393329
planet131131887871
planet1131131887871
pma9494867268
s16599615450
s14881241311088986
s14941261321109080
s1a4981433834
s208123110911
styr93120817062
tma4539393029
sand1321321149982
s4201031989
s5104848322220
s8208882685248
s8328079625046
Total134014931091912830
Percentage,%161.45179.88131.45109.88100.00
Table 12. Experimental results (the maximum operating frequency for all benchmarks).
Table 12. Experimental results (the maximum operating frequency for all benchmarks).
BenchmarkAuto [13]One-Hot [13]JEDI [14] U 2 [14]Our ApproachGroup
bbtas204.16204.16206.12200.38199.480
dk17199.28167199.39199.87198.030
dk27206.02201.9204.18196.65194.180
dk512196.27196.27199.75194.17192.340
ex3194.86194.86195.76191.22188.140
ex5180.25180.25181.16178.06176.740
lion202.43204202.35200.18199.120
lion9205.3185.22206.38199.12197.070
mc196.66195.47196.87193.17191.520
modulo12207207207.13201.12200.230
shiftreg262.67263.57276.26256.69253.240
bbara193.39193.39212.21202.23201.121
bbsse157.06169.12182.34181.23178.641
beecount166.61166.61187.32185.14183.721
cse146.43163.64178.12175.18172.421
dk14191.64172.65193.85190.18188.861
dk15192.53185.36194.87192.23191.481
dk16169.72174.79197.13194.34192.831
donfile184.03184203.65200.92198.471
ex2198.57198.57200.14198.32197.261
ex4180.96177.71192.83190.14188.321
ex6169.57163.8176.59171.27170.181
ex7200.04200.84200.6198.14197.381
keyb156.45143.47168.43162.01161.251
mark1162.39162.39176.18170.18168.041
opus166.2166.2178.32175.29173.411
s27198.73191.5199.13196.13194.171
s386168.15173.46179.15176.85174.621
s8180.02178.95181.23178.23176.221
sse157.06169.12174.63170.12167.431
ex1150.94139.76176.87182.34180.012
kirkman141.38154156.68167.15165.622
planet132.71132.71187.14189.12187.072
planet1132.71132.71187.14189.12187.072
pma146.18146.18169.83178.19176.262
s1146.41135.85157.16162.23164.122
s1488138.5131.94157.18168.32167.142
s1494149.39145.75164.34172.27170.952
s1a153.37176.4169.17178.21176.252
s208174.34176.46178.76181.72180.292
styr137.61129.92145.64161.87159.252
tma163.88147.8164.14176.72175.062
sand115.97115.97126.82145.68152.493
s420173.88176.46177.25187.23190.564
s510177.65177.65181.42187.32190.244
s820152153.16176.58181.96183.124
s832145.71153.23173.78186.12187.454
Total8127.088061.228701.978536.278658.86
Percentage,%93.8693.10100.5098.58100.00
Table 13. Experimental results (the maximum operating frequency for G0).
Table 13. Experimental results (the maximum operating frequency for G0).
BenchmarkAuto [13]One-Hot [13]JEDI [14] U 2 [14]Our Approach
bbtas204.16204.16206.12200.38199.48
dk17199.28167199.39199.87198.03
dk27206.02201.9204.18196.65194.18
dk512196.27196.27199.75194.17192.34
ex3194.86194.86195.76191.22188.14
ex5180.25180.25181.16178.06176.74
lion202.43204202.35200.18199.12
lion9205.3185.22206.38199.12197.07
mc196.66195.47196.87193.17191.52
modulo12207207207.13201.12200.23
shiftreg262.67263.57276.26256.69253.24
Total2254.902199.702275.352032.572190.09
Percentage,%102.96100.44103.8992.81100.00
Table 14. Experimental results (the maximum operating frequency for G1).
Table 14. Experimental results (the maximum operating frequency for G1).
BenchmarkAuto [13]One-Hot [13]JEDI [14] U 2 [14]Our Approach
bbara193.39193.39212.21202.23201.12
bbsse157.06169.12182.34181.23178.64
beecount166.61166.61187.32185.14183.72
cse146.43163.64178.12175.18172.42
dk14191.64172.65193.85190.18188.86
dk15192.53185.36194.87192.23191.48
dk16169.72174.79197.13194.34192.83
donfile184.03184203.65200.92198.47
ex2198.57198.57200.14198.32197.26
ex4180.96177.71192.83190.14188.32
ex6169.57163.8176.59171.27170.18
ex7200.04200.84200.6198.14197.38
keyb156.45143.47168.43162.01161.25
mark1162.39162.39176.18170.18168.04
opus166.2166.2178.32175.29173.41
s27198.73191.5199.13196.13194.17
s386168.15173.46179.15176.85174.62
s8180.02178.95181.23178.23176.22
sse157.06169.12174.63170.12167.43
Total3339.553335.573576.723508.133475.82
Percentage,%96.0895.96102.90100.93100.00
Table 15. Experimental results (the maximum operating frequency for G2–G4).
Table 15. Experimental results (the maximum operating frequency for G2–G4).
BenchmarkAuto [13]One-Hot [13]JEDI [14] U 2 [14]Our Approach
ex1150.94139.76176.87182.34180.01
kirkman141.38154156.68167.15165.62
planet132.71132.71187.14189.12187.07
planet1132.71132.71187.14189.12187.07
pma146.18146.18169.83178.19176.26
s1146.41135.85157.16162.23164.12
s1488138.5131.94157.18168.32167.14
s1494149.39145.75164.34172.27170.95
s1a153.37176.4169.17178.21176.25
s208174.34176.46178.76181.72180.29
styr137.61129.92145.64161.87159.25
tma163.88147.8164.14176.72175.06
sand115.97115.97126.82145.68152.49
s420173.88176.46177.25187.23190.56
s510177.65177.65181.42187.32190.24
s820152153.16176.58181.96183.12
s832145.71153.23173.78186.12187.45
Total2532.632525.952849.902995.572992.95
Percentage,%84.6284.4095.22100.09100.00
As follows from Table 8, the U 3 -based FSMs require fewer LUTs than do the other investigated methods. Our approach produces circuits with 46.52% less 6-LUTs than for equivalent auto-based FSMs; 70.50% less 6-LUTs than for equivalent one-hot-based FSMs; and 20.66% less 6-LUTs than for equivalent JEDI-based FSMs. Additionally, our approach provides the gain (7.21%) respectively to equivalent U 2 –based FSMs. However, the amount of gain (or loss) depends on each group that a particular benchmark belongs to.
As follows from Table 9, our approach loses compared to all other investigated methods. There are the following losses: 30.11% relative to auto-based FSMs; 4.44% relative to one-hot-based FSMs; 32.22% relative to JEDI-based FSMs (7.58% loss); and 1.11% relative to U 2 -based FSMs. So, it does not make sense to use the U 3 -based FSMs to implement the circuits for FSMs of the group G0.
Let us explain the reasons for these losses. Comparing the results for group G0 shows that both multilevel approaches ( U 2 and U 3 ) lose out to the other methods. For FSM U 2 , the loss is 30% compared to auto-based FSMs, 3.43% compared to one-hot-based FSMs, and 31.11% compared to JEDI-based FSMs. We explain this by the fact that condition (4) holds for benchmarks of G0. In this case, only a single LUT is needed to implement any function from SBFs (2) and (3). So, there is no need in the encoding of COs. However, as follows from Figure 4 and Figure 6, this method is always used in both multi-level FSMs U 2 and U 3 . Due to it, for the group G0, the multilevel FSMs have higher LUT counts than for the other investigated design methods.
Table 16. Experimental results (number of flip-flops).
Table 16. Experimental results (number of flip-flops).
BenchmarkAuto [14]One-Hot [14]JEDI [14] U 2 [14]Our ApproachGroup
bbtas494460
dk174164460
dk274104460
dk5125245560
ex34144460
ex54164460
lion353340
lion94114440
mc383360
modulo124124460
shiftreg4164460
bbara4124441
bbsse5265581
beecount4104461
cse53255101
dk145265581
dk155175581
dk167757761
donfile5245521
ex25255541
ex45185581
ex64144481
ex75175541
keyb52255101
mark15225581
opus5185581
s274114441
s38652355101
s84154441
sse52655101
ex178077122
kirkman64866122
planet78677122
planet178677122
pma64966102
s165466122
s1488711277142
s1494711877162
s1a78677142
s20863766122
styr76777142
tma66366102
sand78877143
s420813788164
s510817288124
s82077877144
s83277677164
Total2512011251251414
Percentage,%60.63485.7560.6360.63100.00
Table 17. Experimental results (number of flip flops with regard to the register of outputs).
Table 17. Experimental results (number of flip flops with regard to the register of outputs).
BenchmarkAutoOne-HotJEDI U 2 Our ApproachGroup
bbtas6116660
dk177197760
dk276126660
dk5128278860
ex36166660
ex56186660
lion464440
lion95125540
mc8138860
modulo125135560
shiftreg5175560
bbara6146641
bbsse1233121281
beecount1218121261
cse19461919101
dk141031101081
dk151022101081
dk161078101061
donfile6256621
ex27277741
ex41427141481
ex61222121281
ex77197741
keyb12291212101
mark12138212181
opus1124111181
s275125541
s38612301212101
s85165541
sse12331212101
ex126992626122
kirkman12541212122
planet261052626122
planet1261052626122
pma14571414102
s113611313122
s1488261312626142
s1494261372626162
s1a13921313142
s20883988122
styr17771717142
tma15721515102
sand16971616143
s420101391010164
s510151791515124
s82026972626144
s83226952626164
Total5842344584584414
Percentage,%141.06566.18141.06141.06100
Table 18. Experimental results (total on-chip power, Watts).
Table 18. Experimental results (total on-chip power, Watts).
BenchmarkAuto [61]One-Hot [61]JEDI [61] U 2 [61]Our ApproachGroup
bbtas0.5330.5330.5330.6610.6810
dk171.9011.9351.8912.3632.4120
dk271.1680.8541.1581.4591.6820
dk5121.4961.4961.3451.7081.8240
ex30.3910.3910.3910.5010.5430
ex50.3870.3870.3850.4960.4180
lion0.5420.6290.5470.7110.7250
lion90.7330.970.7280.9390.8390
mc0.4470.5610.4430.5670.7430
modulo120.5590.5590.5630.7150.7970
shiftreg0.5230.6030.5120.6450.8750
bbara0.5690.5690.4880.3990.2921
bbsse2.221.2061.7131.5221.6131
beecount1.6311.6311.0210.8350.8281
cse0.9581.0190.8910.6830.7951
dk142.9593.332.9522.8923.0761
dk151.4031.9051.3991.3121.8321
dk162.9672.7422.5122.3352.1191
donfile0.7090.7090.6030.4780.2981
ex20.3680.3860.3420.2670.2011
ex41.5621.2411.1870.9231.0171
ex62.2693.852.2421.9752.1151
ex70.9921.1810.9940.9980.8781
keyb1.0931.0711.0750.7960.8481
mark11.4451.4451.2271.0871.2111
opus1.3441.3441.2831.1211.1381
s270.7561.950.7650.5640.5121
s3861.2511.3931.1210.9981.2181
s80.7360.8050.7320.6820.6021
sse1.221.2961.0890.9071.0281
ex14.1022.9682.3421.7281.8322
kirkman1.6931.8441.4391.1271.2482
planet4.1224.1222.4562.0282.1952
planet14.1224.1222.4562.0282.2962
pma1.371.371.2530.8030.8892
s12.6853.132.5182.0482.3252
s14883.9824.0963.5481.8832.4512
s14943.0793.1782.9822.3582.8692
s1a1.3222.011.2080.8851.1322
s2081.3672.821.2490.9571.3712
styr4.0444.7713.1872.6322.8982
tma1.5891.3141.3210.9181.1452
sand1.1491.1490.9880.6170.8573
s4201.3372.821.2860.8920.9944
s5101.5431.5431.0910.8520.8974
s8202.0541.8011.4630.8431.0424
s8322.0962.0871.8280.9321.3294
Total76.78883.13664.74755.07060.930
Percentage,%126.03136.45106.2690.38100
Table 19. Experimental results (area–time products).
Table 19. Experimental results (area–time products).
BenchmarkAuto [14]One-Hot [14]JEDI [14] U 2 Our ApproachGroup
bbtas24.4924.4924.2639.9240.100
dk1725.0971.8625.0840.0340.400
dk2714.5624.7619.5935.6036.050
dk51250.9550.9545.0661.8067.590
ex346.1946.1945.9757.5358.470
ex549.9349.9349.6856.1656.580
lion9.8824.519.8829.9730.130
lion929.2359.3924.2340.1840.590
mc20.3435.8120.3231.0631.330
modulo1233.8233.8233.8044.7544.950
shiftreg7.6122.767.2415.5815.800
bbara87.9187.9147.1249.4554.691
bbsse210.11218.78131.62143.46134.351
beecount114.04114.0474.7475.6270.761
cse273.17403.32202.11188.38185.591
dk1483.49156.3951.5963.1058.241
dk1577.9186.3261.5831.2136.561
dk1688.38194.5260.8756.6051.861
donfile168.45168.48117.85104.52100.771
ex245.3245.3239.9740.3445.631
ex482.8973.1562.2357.8553.101
ex6141.53219.78124.58122.61117.521
ex720.0024.9019.9430.2835.461
keyb274.85425.18237.49228.38235.661
mark1141.63141.63113.52111.65107.121
opus168.47168.47123.37119.80115.331
s2730.1993.9930.1330.5936.051
s386154.62224.84122.80141.36125.991
s849.9950.2949.6650.5056.751
sse210.11218.78171.79152.83149.321
ex1463.76529.48299.66219.37188.882
kirkman297.07376.62248.91197.43175.102
planet987.11987.11470.24412.44379.542
planet1987.11987.11470.24412.44379.542
pma643.04643.04506.39404.06385.792
s1443.96728.74388.14332.86304.662
s1488895.31992.88687.11528.75514.542
s1494843.43905.66669.34522.44467.972
s1a319.49459.18254.18213.23192.912
s20868.83175.6855.9449.5361.012
styr675.82923.65556.17432.45389.322
tma274.59263.87237.60169.76165.662
sand1138.231138.23898.91679.57537.743
s42057.51175.6850.7842.7347.234
s510270.19270.19176.39117.45105.134
s820578.95535.39385.09285.78262.124
s832549.04515.56356.77268.64245.404
Total12,228.6114,168.648859.937540.037035.28
Percentage,%173.82201.39125.94107.17100.00
Table 20. Experimental results (power–time products, nJ).
Table 20. Experimental results (power–time products, nJ).
BenchmarkAutoOne-HotJEDI U 2 Our ApproachGroup
bbtas2.612.612.593.303.410
dk179.5411.599.4811.8212.180
dk275.674.235.677.428.660
dk5127.627.626.738.809.480
ex32.012.012.002.622.890
ex52.152.152.132.792.370
lion2.683.082.703.553.640
lion93.575.243.534.724.260
mc2.272.872.252.943.880
modulo122.702.702.723.563.980
shiftreg1.992.291.852.513.460
bbara2.942.942.301.971.451
bbsse14.137.139.398.409.031
beecount9.799.795.454.514.511
cse6.546.235.003.904.611
dk1415.4419.2915.2315.2116.291
dk157.2910.287.186.839.571
dk1617.4815.6912.7412.0210.991
donfile3.853.852.962.381.501
ex21.851.941.711.351.021
ex48.636.986.164.855.401
ex613.3823.5012.7011.5312.431
ex74.965.884.965.044.451
keyb6.997.466.384.915.261
mark18.908.906.966.397.211
opus8.098.097.196.406.561
s273.8010.183.842.882.641
s3867.448.036.265.646.981
s84.094.504.043.833.421
sse7.777.666.245.336.141
ex127.1821.2413.249.4810.182
kirkman11.9711.979.186.747.542
planet31.0631.0613.1210.7211.732
planet131.0631.0613.1210.7212.272
pma9.379.377.384.515.042
s118.3423.0416.0212.6214.172
s148828.7531.0422.5711.1914.662
s149420.6121.8018.1513.6916.782
s1a8.6211.397.144.976.422
s2087.8415.986.995.277.602
styr29.3936.7221.8816.2618.202
tma9.708.898.055.196.542
sand9.919.917.794.245.623
s4207.6915.987.264.765.224
s5108.698.696.014.554.724
s82013.5111.768.294.635.694
s83214.3813.6210.525.017.094
Total484.24528.25365.05301.91337.11
Percentage,%143.64156.70108.2989.56100
However, our approach gives a win starting from group G1. As follows from Table 10 and Table 11, using the model U 3 gives a win for groups G1–G4. Compared with auto-based FSMs, there is either a 29.3% win rate (G1) or 61.45% of gain in LUT counts (groups G2–G4). Compared with one-hot-based FSMs, there is either a 67.2% win rate (G1) or 79.88% of gain in LUT counts (groups G2–G4). Compared with JEDI-based FSMs, there is either 7.32% of gain (G1) or a 31.45% win rate (G2–G4). Compared with U 2 -based FSMs, there is either 2.55% of gain (G1) or a 9.88% win rate (G2–G4). So, the gain from applying the proposed approach increases with the growth of the number of FSM inputs and state variables.
Let us explain the nature of this situation. Starting from G1, the condition (4) is violated. This means that the methods of functional decomposition should be applied for FSMs based on auto, one-hot and JEDI. However, both FSMs U 2 and U 3 are based on the methods of structural decomposition. As follows from [13], using the SD-based methods allows improving LUT counts compared with the FD-based methods. A similar phenomenon also occurs in our case. There is only one set of additional variables in FSMs U 3 . However, FSMs U 2 have two such sets. As follows from the research results, the implementation of systems (5) and (10) requires more internal resources than the implementation of the system (7). This advantage of FSMs U 3 in relation to FSMs U 2 explains the gain in LUTs that the method proposed in this article gives.
From the analysis of Table 10, it follows that for group G1, the following phenomenon takes place. In some cases, the circuits of FSMs U 2 require fewer LUTs than it is for equivalent FSMs U 3 . This situation takes place for benchmarks: bbara, dk15, ex2, ex7, keyb, s27, and s8. However, for other benchmarks of G1, the circuits of FSMs U 3 have better LUT counts than for equivalent FSMs U 2 . Let us explain this phenomenon.
In LUT-based FSMs, the LUT counts depend on the relation among N A ( ϕ k ) and I. Both FSMs U 2 and U 3 include logic blocks generating outputs y n Y . Obviously, these blocks consume the same amount of LUTs. So, the difference in LUTs depends on LUT counts for other blocks of these FSMs. For FSMs U 2 , the number of LUTs depends on the distribution of FSM inputs among the functions belonging to SBFs (5), (9) and (10). For FSMs U 3 , the LUT count depends on relation among the value of 2 R Q and the number of LUT inputs, I. If the condition (4) holds but the condition 2 R Q I is violated, then there are fewer LUTs in the circuits of FSMs U 2 compared to the circuits of equivalent FSMs U 3 . We think that such situation takes place for the benchmarks bbara, dk15, ex2, ex7, keyb, s27, and s8. For other benchmarks of G1, the following situation takes place: the condition (4) is violated but the condition 2 R Q I holds. As a result, for these benchmarks, there are fewer LUTs in the circuits of FSMs U 3 compared to the circuits of equivalent FSMs U 2 . It seems that this situation takes place for all benchmarks from the groups G2–G4. As a result, our approach allows obtaining better LUT counts for all benchmarks from these groups.
As follows from Table 12, our approach produces slightly faster LUT-based FSM circuits compared to the three other investigated methods. The average win is equal to (1) 6.14% (compared with auto-based FSMs); (2) 6.9% (relative to one-hot-based FSMs); (3)1.42% (compared with U 2 -based FSMs). The winning relative to U 2 -based FSMs is especially important. It shows that our method not only improves the LUT counts, but also does not degrade the performance compared to three-block FSMs U 2 . Note that our approach loses in the performance of the obtained FSM circuits relative to JEDI-based FSMs (only 0.5%).
For the group G0 (Table 13), our approach provides a gain relative to U 2 -based FSMs (7.19%). However, other investigated methods win in the values of maximum operating frequency. The auto-based state encoding provides to 2.96% of gain. The JEDI-based state encoding provides 3.89% of gain. It means that our approach should not be applied if the number of LUT inputs is not less than the total number of FSM inputs (L) and state variables (R).
So, for the group G0, there is the performance loss of SD-based FSMs in comparison with FD-based FSMs. This loss can be explained in the following way. Because the condition (4) holds, there is only a single logic level in the circuits of FD-based FSMs (auto, one-hot, JEDI). However, as follows from Figure 4 and Figure 6, there are three logic levels in the circuits of U 2 -based FSMs and two logic levels in the circuits of U 3 -based FSMs. Therefore, the SD-based FSMs produce slower circuits compared to their FD-based counterparts.
As follows from Table 14, for the group G1, our approach produces faster circuits than both auto- and one-hot-based FSMs. Our gain is equal to 3.92% and 4.04%, respectively. However, the FSM circuits produced by two other methods are slightly faster than U 3 -based circuits. The JEDI-based FSMs win 2.9%. The U 2 -based FSMs win 0.93%. Thus, the number of logic levels in the FD-based FSMs has increased, but still remains less than this number in the equivalent SD-based FSMs. The analysis of Table 15 shows that only U 2 -based FSM circuits are a bit faster than the equivalent circuits based on our approach. This win is equal to 0.09%. However, our approach allows producing the faster circuits as compared with auto (15.38%), one-hot (15.6%) and JEDI (4.78%).
Note that to compare different FPGA-based circuits of equivalent devices, such estimates as the number of flip-flops in the circuit, its power consumption, the product of the number of LUTs and the cycle time (the area-time characteristic), the product of the power consumption and the cycle time (the power-time characteristic) can be used. We also compared these characteristics of FSM circuits for the models used in the research. The numbers of flip-flops used in FSM circuits are shown in Table 16 and Table 17. Table 18 contains information about the power consumption. The area–time characteristics are shown in Table 19. The power–time characteristics are shown in Table 20.
As follows from Table 16, our method significantly loses in the number of flip-flops to all other methods (except for the one-hot approach). This is determined by the fact that the number of flip-flops is the same as the number of bits in the state codes K ( a m ) . For the proposed FSM U 3 , the number of flip-flops is equal to twice the number of bits in the codes of COs. Due to it, our method loses an average of 39.37% to FSMs based on methods auto, JEDI and U 2 .
However, this is not entirely true if we consider an FSM as a block of some digital system. It is known that the outputs of the Mealy FSM are not stable. They can change when the input signals change. The FSM inputs are the outputs of the remaining system blocks. This phenomenon can lead to malfunctions in the functioning of the digital system. To eliminate possible failures, an intermediate register is introduced into the system. The FSM outputs are recorded in this register after the end of transient processes in the remaining blocks of the system. So, to find the required number of flip-flops, it is necessary to add a value of N (the number of FSM outputs) to the value obtained from the Vivado reports. For example, there are 7 flip-flops in FSM s1494 for the model U 2 and 16 flip-flops for the model U 3 (Table 16). As follows from Table 7, there is N = 19 for FSM s1494. So, as a block interacting with other blocks of a digital system, this U 2 -based FSM s1494 requires 26 flip-flops. Using the same approach, we can create Table 17.
The proposed method does not require such an additional output register. This is due to the fact that the codes of COs are written to the registers. Therefore, for the model U 3 , the FSM outputs are stable after being written to the registers. So, when choosing an FSM model, the designer must add the number of outputs to all numbers from the Table 16 except for the numbers obtained for U 3 -based FSMs. This fact explains the coincidence of information in columns “Our approach” of Table 16 and Table 17.
As follows from Table 17, our method allows the use of fewer flip-flops compared to other methods studied. The gain is 41.06% compared to methods auto, JEDI and U 2 and 166.18% compared to the FSMs based on the one-hot approach.
To estimate the power consumption, we also used Vivado. Vivado uses the value of maximum operating frequency achieved for each benchmark and calculates the value of power consumption basing on this frequency. To conduct the research, the core voltage (VCCINT) was set to 1.0V. The data in the Table 18 are taken from the Vivado Power Reports.
As follows from Table 18, the U 3 –based FSMs consume more power than equivalent U 2 –based FSMs (the loss is on average 9.62%). We think this is because (1) U 3 –based FSMs have more flip-flops compared to the equivalent U 2 -based FSMs and (2) the switching activity of flip-flops from U 3 is significantly higher than it is for equivalent U 2 -based FSMs. However, the application of our method allows reducing the power consumption compared to the FSM circuits based on auto (26.03%), one-hot (36.45%) and JEDI (6.26%).
Let us point out that Table 18 shows the power consumption characteristics for FSMs as stand-alone units. If we consider an FSM as some part of a digital system, then the situation can change significantly in favor of our method. This conclusion can be made from the analysis of Table 17.
So far, we have only discussed estimates for one of the FSM circuit characteristics. However, the quality of FSM circuits is often evaluated by integral estimates. One such assessment is that which shows how much chip area is used to achieve a certain cycle time. In the case of LUT-based FSM circuits, the required FPGA chip area is usually estimated by the number of LUTs used [12]. This approach is adopted in our article, and the results are shown in Table 19.
As follows from Table 19, our approach provides an average gain of 7.17% compared to the equivalent U 2 -based FSMs. The gain compared to other methods is even more significant: (1) 73.82% compared to auto; (2) 101.39% compared to one-hot and (3) 25.94% compared to JEDI. We do not provide here tables for each of the FSM groups. However, we conducted such a study, and its results showed the following. Our approach is an outsider for the group G0, where we lose (1) 32.45% compared to auto; (2) 3.79% compared to one-hot; (3) 33.96% compared to JEDI and (4) 2.04% compared to U 2 -based FSMs. Winning starts with group G1. In this group, our method wins (1) 36.84% with respect to auto; (1) 75.98% with respect to one-hot; (1) 4.08% with respect to JEDI; and (4) 1.57% compared to the equivalent U 2 -based FSMs. The greatest gain is observed for the most complex FSMs belonging to the groups G2–G4. For these groups, our method wins (1) 97.68% with respect to auto; (1) 120.88% with respect to one-hot; (1) 39.76% with respect to JEDI; and (4) 10.13% compared to the equivalent U 2 -based FSMs. So, the gain from the application of our method increases as the FSM complexity increases.
The power–time (power–delay) product shows how much energy is spent on the execution of one cycle of operation [62]. In case of discussed benchmarks, the cycle time is measured in nanoseconds. Since the power is measured in Watts, the resulting power–time products are presented in nanojoules (nJ). These results are shown in Table 20.
As follows from Table 20, the U 3 –based FSMs have higher energy values than the equivalent U 2 -based FSMs (the loss is on average 10.44%). We think this is because (1) U 3 -based FSMs have more flip-flops compared to the equivalent U 2 -based FSMs, and (2) the switching activity of flip-flops from U 3 is significantly higher than it is for the equivalent U 2 -based FSMs. However, U 3 -based FSMs require less energy compared to FSM circuits based on auto (43.64%), one-hot (56.70%) and JEDI (8.29%).
For a better understanding of the experimental results, we created Table 21. The first column of this table contains the total values for each of the studied characteristics. The remaining columns contain the values of these characteristics for each of the studied methods. The best values for each of the characteristics are shown in bold. The goal of our method is to reduce the number of LUTs without a significant decrease in frequency in relation to three-level U 2 -based FSMs. Due to it, in the “Gain” column, we show the gain or loss (negative gain) of our method with respect to U 2 -based FSMs.
As follows from Table 21, our method allows reducing the LUT counts (the chip area occupied by FSM circuit) compared to equivalent U 2 -based FSM having three logic blocks. The results of experiments show that there is no degradation in FSM performance. On the contrary, there is a slight gain in this characteristic (1.42%). So, the results of our experiments show that the proposed approach can be used instead of other models starting from the simple FSMs (the group G1). However, the proposed method cannot be used if the dominant factor determining the FSM circuit optimality is its power consumption. We think that the proposed model can be used in CAD systems targeting LUT-based Mealy FSMs if the dominant factor determining the FSM circuit optimality is either the number of LUTs or area–time products.

7. Conclusions

Nowadays, the majority of digital systems are implemented using FPGAs. So, FPGAs are used for implementing circuits of FSMs representing various sequential blocks. As the complexity of the FSMs (the numbers of inputs, outputs and states) increases, the contradiction between this significant complexity and a very small number of LUT inputs increases, too. Modern LUTs have around six inputs. This value is still rather small compared with numbers of literals in SBFs representing FSM circuits. This leads to using various methods of functional decomposition in the LUT-based FSM design. It is known [39] that the functional decomposition leads to multi-level LUT-based FSM circuits having spaghetti-type interconnections.
In many cases, the characteristics of FPGA-based FSM circuits can be improved due to applying the methods of structural decomposition instead of using the methods of functional decomposition [13]. Our research [15] shows that three-block circuits of LUT-based Mealy FSM circuits require fewer LUTs than some of their counterparts. But this gain is connected with the introduction of some additional functions. This requires using additional chip internal resources to generate these functions. This is the main disadvantage of the three-block FSM circuits.
In this article, we propose to use the codes of collections of outputs to represent both the outputs and state variables of Mealy FSMs. This is connected with using two registers keeping codes of COs. Using this approach, it is possible to generate in parallel FSM outputs and codes of the transition states. This leads to Mealy FSM circuits having two levels of LUTs. These circuits require fewer LUTs than it is in the equivalent three-block FSM circuits. The experiments prove that the proposed approach allows reducing hardware compared with such known methods as auto and one-hot of Vivado, and JEDI. Additionally, the proposed approach gives better results than a method based on the simultaneous replacement of inputs and encoding of COs.
Compared to circuits of the three-block FSMs, the LUT counts are reduced by an average of 7.21% without a significant reduction in the performance. The gain in LUT counts and area–time products increases with the increase in the numbers of FSM states and inputs. Our approach loses in terms of power consumption (on average 9.62%) and power–time products (on average 10.44%). As the experiments show, the proposed two-block FSMs have practically the same cycle times (maximum operating frequencies) as their three-block counterparts. This analysis allows us to conclude that the proposed method can be used for improving the LUT counts of various FPGA-based sequential devices.

Author Contributions

Conceptualization, A.B., L.T. and K.K.; methodology, A.B., L.T. and K.K.; formal analysis, A.B., L.T. and K.K.; writing—original draft preparation, A.B., L.T. and K.K.; supervision, A.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in the article.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CADcomputer-aided design
CLBconfigurable logic block
COcollection of outputs
DSTdirect structure table
ECOencoding of collections of outputs
LUTlook-up table
FDfunctional decomposition
FSMfinite-state machine
FPGAfield-programmable gate array
IMFinput memory function
LUTlook-up table
RIreplacement of inputs
SBFsystem of Boolean functions
SDstructural decomposition
STGstate transitions graph
STTstate transition table
SOPsum of products

References

  1. Grout, I. Digital Systems Design with FPGAs and CPLDs; Elsevier Science: Amsterdam, The Netherlands, 2011. [Google Scholar]
  2. Ruiz-Rosero, J.; Ramirez-Gonzalez, G.; Khanna, R. Field Programmable Gate Array Applications—A Scientometric Review. Computation 2019, 7, 63. [Google Scholar] [CrossRef]
  3. Gajski, D.D.; Abdi, S.; Gerstlauer, A.; Schirner, G. Embedded System Design: Modeling, Synthesis and Verification; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
  4. Baranov, S. Finite State Machines and Algorithmic State Machines: Fast and Simple Design of Complex Finite State Machines; Amazon: Seattle, WA, USA, 2018; p. 185. [Google Scholar]
  5. Baranov, S. Logic Synthesis of Control Automata; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1994. [Google Scholar]
  6. Czerwinski, R.; Kania, D. Finite State Machine Logic Synthesis for Complex Programmable Logic Devices; Volume 231 of Lecture Notes in Electrical Engineering; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
  7. Gazi, O.; Arli, A. State Machines Using VHDL: FPGA Implementation of Serial Communication and Display Protocols; Springer: Berlin/Heidelberg, Germany, 2021; p. 326. [Google Scholar]
  8. Koo, B.; Bae, J.; Kim, S.; Park, K.; Kim, H. Test case generation method for increasing software reliability in Safety-Critical Embedded Systems. Electronics 2020, 9, 797. [Google Scholar] [CrossRef]
  9. Baranov, S. High-Level Synthesis of Digital Systems: For Data-Path and Control Dominated Systems; Amazon: Seattle, WA, USA, 2018; p. 207. [Google Scholar]
  10. Zhao, X.; He, Y.; Chen, X.; Liu, Z. Human-Robot collaborative Assembly Based on Eye-Hand and a Finite State Machine in a Virtual Environment. Appl. Sci. 2021, 11, 5754. [Google Scholar] [CrossRef]
  11. Jozwiak, L.; Slusarczyk, A.; Chojnacki, A. Fast and compact sequential circuits for the FPGA-based reconfigurable systems. J. Syst. Archit. 2003, 49, 227–246. [Google Scholar] [CrossRef]
  12. Islam, M.M.; Hossain, M.S.; Shahjalal, M.D.; Hasan, M.K.; Jang, Y.M. Area-time efficient hardware implementation of modular multiplication for elliptic curve cryptography. IEEE Access 2020, 8, 73898–73906. [Google Scholar] [CrossRef]
  13. Barkalov, A.; Titarenko, L.; Krzywicki, K. Structural Decomposition in FSM Design: Roots, Evolution, Current State—A Review. Electronics 2021, 10, 1174. [Google Scholar] [CrossRef]
  14. Barkalov, A.; Titarenko, L.; Krzywicki, K.; Saburova, S. Improving the Characteristics of Multi-Level LUT-Based Mealy FSMs. Electronics 2020, 9, 1859. [Google Scholar] [CrossRef]
  15. Barkalov, A.; Titarenko, L.; Krzywicki, K. Reducing LUT Count for FPGA-Based Mealy FSMs. Appl. Sci. 2020, 10, 5115. [Google Scholar] [CrossRef]
  16. Micheli, G.D. Synthesis and Optimization of Digital Circuits; McGraw-Hill: Cambridge, MA, USA, 1994. [Google Scholar]
  17. Kubica, M.; Kania, D.; Kulisz, J. A technology mapping of fsms based on a graph of excitations and outputs. IEEE Access 2019, 7, 16123–16131. [Google Scholar] [CrossRef]
  18. Sklarova, D.; Sklarov, V.A.; Sudnitson, A. Design of FPGA-Based Circuits Using Hierarchical Finite State Machines; TUT Press: Tallinn, Estonia, 2012. [Google Scholar]
  19. AMD Xilinx FPGAs. Available online: https://www.xilinx.com/products/silicon-devices/fpga.html (accessed on 25 May 2022).
  20. Trimberger, S.M. Field-Programmable Gate Array Technology; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
  21. Mishchenko, A.; Brayton, R.; Jiang, J.H.R.; Jang, S. Scalable don’t-care-based logic optimization and resynthesis. ACM Trans. Reconfigurable Technol. Syst. (TRETS) 2011, 4, 1–23. [Google Scholar] [CrossRef]
  22. Kubica, M.; Opara, A.; Kania, D. Logic Synthesis Strategy Oriented to Low Power Optimization. Appl. Sci. 2021, 11, 8797. [Google Scholar] [CrossRef]
  23. Nguyen, T.T.; Kim, S.; Eom, Y.; Lee, H. Area-Time Efficient Hardware Architecture for CRYSTALS-Kyber. Appl. Sci. 2022, 12, 5305. [Google Scholar] [CrossRef]
  24. Ney, J.; Hammoud, B.; Dörner, S.; Herrmann, M.; Clausius, J.; ten Brink, S.; Wehn, N. Efficient FPGA Implementation of an ANN-Based Demapper Using Cross-Layer Analysis. Electronics 2022, 11, 1138. [Google Scholar] [CrossRef]
  25. Jarrah, A.; Haymoor, Z.S.; Al-Masri, H.M.; Almomany, A. High-Performance Implementation of Power Components on FPGA Platform. J. Electr. Eng. Technol. 2022, 17, 1555–1571. [Google Scholar] [CrossRef]
  26. Nikolic, S.; Zgheib, G.; Ienne, P. Detailed Placement for Dedicated LUT-Level FPGA Interconnect. ACM Trans. Reconfig. Technol. Syst. (TRETS) 2022. [Google Scholar] [CrossRef]
  27. Skliarova, I. A Survey of Network-Based Hardware Accelerators. Electronics 2022, 11, 1029. [Google Scholar] [CrossRef]
  28. Senhadji-Navarro, R.; Garcia-Vargas, I. Mapping Arbitrary Logic Functions onto Carry Chains in FPGAs. Electronics 2022, 11, 27. [Google Scholar] [CrossRef]
  29. Scholl, C. Functional Decomposition with Application to FPGA Synthesis; Kluwer Academic Publishers: Boston, MA, USA, 2001. [Google Scholar]
  30. Kubica, M.; Kania, D. Technology mapping oriented to adaptive logic modules. Bull. Pol. Acad. Sci. 2019, 67, 947–956. [Google Scholar]
  31. Mishchenko, A.; Chattarejee, S.; Brayton, R. Improvements to technology mapping for LUT-based FPGAs. IEEE Trans. CAD 2006, 27, 240–253. [Google Scholar]
  32. Brayton, R.; Mishchenko, A. ABC: An Academic Industrial-Strength Verification Tool. In Computer Aided Verification: Berlin/Heidelberg, Germany, 2010; Touili, T., Cook, B., Jackson, P., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 24–40. [Google Scholar]
  33. Soloviev, V.V. Architetures Xilinx FPGA: Family CPLD and FPGA 7; Hot-line-Telecom: Moskwa, Russia, 2016; p. 392. (In Russian) [Google Scholar]
  34. Altera. Cyclone IV Device Handbook. Available online: http://www.altera.com/literature/hb/cyclone-iv/cyclone4-handbook.pdf (accessed on 25 May 2022).
  35. El-Maleh, A.H. A Probabilistic Tabu Search State Assignment Algorithm for Area and Power Optimization of Sequential Circuits. Arab. J. Sci. Eng. 2020, 45, 6273–6285. [Google Scholar] [CrossRef]
  36. Feng, W.; Greene, J.; Mishchenko, A. Improving FPGA performance with a S44 LUT structure. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 25–27 February 2018; pp. 61–66. [Google Scholar]
  37. Barkalov, A.; Titarenko, L.; Krzywicki, K.; Saburova, S. Improving Characteristics of LUT-Based Mealy FSMs with Twofold State Assignment. Electronics 2021, 10, 901. [Google Scholar] [CrossRef]
  38. Senhadji-Navarro, R.; Garcia-Vargas, I. Methodology for Distributed-ROM-based Implementation of Finite State Machines. IEEE Trans.-Comput.-Aided Des. Integr. Circuits Syst. 2020, 40, 2411–2415. [Google Scholar] [CrossRef]
  39. Kubica, M.; Opara, A.; Kania, D. Technology Maping for LUT-Based FPGA; Springer: Berlin/Heidelberg, Germany, 2021; p. 208. [Google Scholar]
  40. Barkalov, A.; Titarenko, L.; Mielcarek, K.; Chmielewski, S. Logic Synthesis for FPGA-Based Control Units—Structural Decomposition in Logic Design; Volume 636 of Lecture Notes in Electrical Engineering; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
  41. Salauyou, V.; Ostapczuk, M. State Assignment of Finite-State Machines by Using the Values of Output Variables. In Theory and Applications of Dependable Computer Systems. DepCoS-RELCOMEX 2020. Advances in Intelligent Systems and Computing; Zamojski, W., Mazurkiewicz, J., Sugier, J., Walkowiak, T., Kacprzyk, J., Eds.; Springer: Cham, Switzerland, 2020; Volume 1173, pp. 543–553. [Google Scholar]
  42. Solov’ev, V.V. Implementation of finite-state machines based on programmable logic ICs with the help of the merged model of Mealy and Moore machines. J. Commun. Technol. Electron. 2013, 58, 172–177. [Google Scholar] [CrossRef]
  43. Park, J.; Yoo, H. Area-efficient fault tolerance encoding for Finite State Machines. Electronics 2020, 9, 1110. [Google Scholar] [CrossRef]
  44. Sentovich, E.M.; Singh, K.J.; Lavagno, L.; Moon, C.; Murgai, R.; Saldanha, A.; Sangiovanni-Vincentelli, A. SIS: A System for Sequential Circuit Synthesis; University of California: Berkely, CA, USA, 1992. [Google Scholar]
  45. ABC System. Available online: https://people.eecs.berkeley.edu/~alanmi/abc/ (accessed on 25 May 2022).
  46. Baranov, S. From Algorithm to Digital System: HSL and RTL tool Sinthagate in Digital System Design; Amazon: Seattle, WA, USA, 2020; p. 76. [Google Scholar]
  47. Vivado Design Suite User Guide: Synthesis; UG901 (v2019.1); Available online: https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_1/ug901-vivado-synthesis.pdf (accessed on 25 May 2022).
  48. Xilinx Vitis. Available online: https://www.xilinx.com/products/design-tools/vitis/vitis-platform.html (accessed on 25 May 2022).
  49. Quartus Prime. Available online: https://www.intel.pl/content/www/pl/pl/software/programmable/quartus-prime/overview.html (accessed on 25 May 2022).
  50. Khatri, S.P.; Gulati, K. Advanced Techniques in Logic Synthesis, Optimizations and Applications; Springer: New York, NY, USA, 2011. [Google Scholar]
  51. Sklyarov, V. Synthesis and implementation of RAM-based finite state machines in FPGAs. In International Workshop on Field Programmable Logic and Applications; Springer: Berlin/Heidelberg, Germany, 2000; pp. 718–727. [Google Scholar]
  52. Tiwari, A.; Tomko, K.A. Saving power by mapping finite-state machines into embedded memory blocks in FPGAs. Proc. Des. Autom. Test Eur. Conf. Exhib. 2004, 2, 916–921. [Google Scholar]
  53. Wilkes, M.V.; Stringer, J.B. Micro-programming and the design of the control circuits in an electronic digital computer. In Mathematical Proceedings of the Cambridge Philosophical Society; Cambridge University Press: Cambridge, MA, USA, 1953; Volume 49, pp. 230–238. [Google Scholar]
  54. Chapman, K. Multiplexer Design Techniques for Data-Path Performance with Minimized Routing Resources; Xilinx All Programmable; Xilinx Inc.: San Jose, CA, USA, 2014; Version 1.2; pp. 1–32. [Google Scholar]
  55. Achasova, S. Synthesis Algorithms for Automata with PLAs; M: Soviet Radio: Moscow, Russia, 1987. (In Russian) [Google Scholar]
  56. McElvain, K. LGSynth93 Benchmark; Mentor Graphics: Wilsonville, OR, USA, 1993. [Google Scholar]
  57. Benini, L.; Bogliolo, A.; De Micheli, G. A survey of design techniques for system-level dynamic power management. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2000, 8, 299–316. [Google Scholar] [CrossRef]
  58. De Micheli, G.; Brayton, R.K.; Sangiovanni-Vincentelli, A. Optimal state assignment for finite state machines. IEEE Trans.-Comput.-Aided Des. Integr. Circuits Syst. 2006, 4, 269–285. [Google Scholar] [CrossRef]
  59. El-Maleh, A.H. A probabilistic pairwise swap search state assignment algorithm for sequential circuit optimization. Integration 2017, 56, 32–43. [Google Scholar] [CrossRef]
  60. VC709 Evaluation Board for the Virtex-7 FPGA User Guide; UG887 (v1.6); Xilinx, Inc.: San Jose, CA, USA, 2019.
  61. Barkalov, A.; Titarenko, L.; Krzywicki, K.; Saburova, S. Improving Characteristics of LUT-Based Three-Block Mealy FSMs’ Circuits. Electronics 2022, 11, 950. [Google Scholar] [CrossRef]
  62. Han, Z. The power-delay product and its implication to CMOS Inverter. J. Phys. Conf. Ser. 2021, 1754, 1–12. [Google Scholar]
Figure 1. Structural diagram of FSM U 1 .
Figure 1. Structural diagram of FSM U 1 .
Applsci 12 08065 g001
Figure 2. Equivalent fragments of STG (a), STT (b) and DST (c).
Figure 2. Equivalent fragments of STG (a), STT (b) and DST (c).
Applsci 12 08065 g002
Figure 3. Structural diagram of LUT-based FSM U 1 .
Figure 3. Structural diagram of LUT-based FSM U 1 .
Applsci 12 08065 g003
Figure 4. Structural diagram of Mealy FSM U 2 .
Figure 4. Structural diagram of Mealy FSM U 2 .
Applsci 12 08065 g004
Figure 5. Illustration of the main idea of proposed method.
Figure 5. Illustration of the main idea of proposed method.
Applsci 12 08065 g005
Figure 6. Structural diagram of LUT-based Mealy FSM U 3 .
Figure 6. Structural diagram of LUT-based Mealy FSM U 3 .
Applsci 12 08065 g006
Figure 7. STG of Mealy FSM S 1 .
Figure 7. STG of Mealy FSM S 1 .
Applsci 12 08065 g007
Figure 8. The outcome of encoding of COs for FSM S 1 .
Figure 8. The outcome of encoding of COs for FSM S 1 .
Applsci 12 08065 g008
Table 1. The main notation used in the article.
Table 1. The main notation used in the article.
A = { a 1 , , a M } The set of FSM internal states having M elements.
B = { b 1 , , b J } The set of additional variables replacing FSM inputs having J elements where J L .
IThe number of LUT inputs.
K ( a m ) The binary code of state a m A .
K ( Y q ) The binary code of collection of outputs Y q Y with R q = l o g 2 Q bits.
N A ( ϕ k ) The number of literals in a function ϕ k Φ Y .
P g = < Y i , Y j > A pair of collections of outputs replacing a pair <state, input>.
Φ = { D 1 , , D R } The set of FSM input memory functions having R elements.
QThe number of collections of outputs Y q Y .
RThe number of bits in the codes K ( a m ) determined as R = l o g 2 M .
T = { T 1 , , T R } The set of state variables for creating the codes K ( a m ) with R bits.
V = { v 1 , , v R Q } The set of additional variables encoding collections of outputs from the current cycle of FSM operation having RQ bits.
X = { x 1 , , x L } The set of FSM inputs having L elements.
Y = { y 1 , , y N } The set of FSM outputs having N elements.
Z = { z 1 , , z R Q } The set of additional variables encoding collections of outputs from the previous cycle of FSM operation having RQ bits.
Table 2. State transition table of Mealy FSM S 1 .
Table 2. State transition table of Mealy FSM S 1 .
a m a T X h Y h h
a 1 a 2 x 1 y 1 y 2 Y 1 1
a 3 x 1 ¯ y 3 Y 2 2
a 2 a 3 x 1 y 2 y 4 Y 3 3
a 4 x 1 ¯ x 2 y 3 Y 2 4
a 5 x 1 ¯ x 2 ¯ y 3 y 5 Y 4 5
a 3 a 4 1 y 1 y 3 y 6 Y 5 6
a 4 a 5 x 3 y 3 y 5 Y 4 7
a 1 x 3 ¯ Y 0 8
a 5 a 1 1 Y 0 9
Table 3. Modified DST of Mealy FSM U 1 ( S 1 ) .
Table 3. Modified DST of Mealy FSM U 1 ( S 1 ) .
a m K ( a m ) a T K ( a T ) X h Φ h Z h h
a 1 000 a 2 001 x 1 D 3 z 1 1
a 3 010 x 1 ¯ D 2 z 3 2
a 2 001 a 3 010 x 1 D 2 z 1 z 2 3
a 4 011 x 1 ¯ x 2 D 2 D 3 z 3 4
a 5 100 x 1 ¯ x 2 ¯ D 1 z 2 z 3 5
a 3 010 a 4 0111 D 2 D 3 z 1 z 3 6
a 4 011 a 5 100 x 3 D 1 z 2 z 3 7
a 1 000 x 3 ¯ 8
a 5 100 a 1 00019
Table 4. Table of pairs P g for Mealy FSM U 3 ( S 1 ) .
Table 4. Table of pairs P g for Mealy FSM U 3 ( S 1 ) .
a m a T Y m Y T P g g
a 1 a 2 Y 0 Y 1 P 1 1
a 1 a 3 Y 0 Y 2 P 2 2
a 2 a 3 Y 1 Y 3 P 3 3
a 2 a 4 Y 1 Y 2 P 4 4
a 2 a 5 Y 1 Y 4 P 5 5
a 3 a 4 Y 2 Y 5 P 6 6
a 3 a 4 Y 3 Y 5 P 7 7
a 4 a 5 Y 2 Y 4 P 8 8
a 4 a 1 Y 2 Y 0 P 9 9
a 4 a 5 Y 5 Y 4 P 10 10
a 4 a 1 Y 5 Y 0 P 11 11
a 5 a 1 Y 4 Y 0 P 12 12
Table 5. Table of block L Z of Mealy FSM U 3 ( S 1 ) .
Table 5. Table of block L Z of Mealy FSM U 3 ( S 1 ) .
a m K ( a m ) X h Z h h
a 1 000 x 1 z 1 1
x 1 ¯ z 3 2
a 2 001 x 1 z 1 z 2 3
x 1 ¯ x 2 z 3 4
x 1 ¯ x 2 ¯ z 2 z 3 5
a 3 0101 z 1 z 3 6
a 4 011 x 3 z 2 z 3 7
x 3 ¯ 8
a 5 10019
Table 6. Table of block L T of Mealy FSM U 3 ( S 1 ) .
Table 6. Table of block L T of Mealy FSM U 3 ( S 1 ) .
Y m K ( Y m ) Y T K ( Y T ) a T T g g
Y 0 000 Y 1 100 a 2 T 3 1
Y 0 000 Y 2 001 a 3 T 2 2
Y 1 100 Y 3 110 a 3 T 2 3
Y 1 100 Y 2 001 a 4 T 2 T 3 4
Y 1 100 Y 4 011 a 5 T 1 5
Y 2 001 Y 5 101 a 4 T 2 T 3 6
Y 3 110 Y 5 101 a 4 T 2 T 3 7
Y 2 001 Y 4 011 a 5 T 1 8
Y 2 001 Y 0 000 a 1 9
Y 5 101 Y 4 011 a 5 T 1 10
Y 5 101 Y 0 000 a 1 11
Y 4 011 Y 0 000 a 1 12
Table 7. Basic characteristics of benchmarks from library [56].
Table 7. Basic characteristics of benchmarks from library [56].
BenchmarkLN R + L M / R HGroup
bbara42812/4601
bbsse771226/5561
bbtas2269/4240
beecount34710/4281
cse771232/5911
dk1435826/5561
dk1535817/5321
dk1623975/71081
dk1723616/4320
dk2712510/4140
dk51213624/5150
donfile21724/5961
ex19191680/71382
ex222725/5721
ex322614/4360
ex4691118/5211
ex522616/4320
ex658914/4341
ex7221217/5361
keyb771222/51701
kirkman1261848/63702
lion2155/3110
lion921611/4250
mark15161022/5221
mc3568/3100
modulo1211512/4240
opus561018/5221
planet7191486/71152
planet17191486/71152
pma881449/6732
s1871454/61062
s148881915112/72512
s149481915118/72502
s1a861586/71072
s2081121737/61532
s2741811/4341
s386771223/5641
s42019227137/81374
s51019727172/8774
s841815/4201
s82018192578/72324
s83218192576/72454
sand1191888/71843
shiftreg11516/4160
sse771226/5561
styr9101667/71662
tma791363/6442
Table 21. Final comparative table.
Table 21. Final comparative table.
MethodsAutoOne-HotJEDI U 2 Our ApproachGain, %
LUT counts18082104148913231234+7.21
Maximum operating frequency, MHz8127.088061.228701.978536.278658.86+1.42
Number of FFs without output register2512011251251414−39.37
Number of FFs with output register5842344584584414+41.06
Power, Watts76.78883.13664.74755.07060.930−9.62
Area–time products12,228.6114,168.648859.937540.037035.28+7.17
Power–time products, nJ484.24528.25365.05301.91337.11−10.46
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Barkalov, A.; Titarenko, L.; Krzywicki, K. Improving Hardware in LUT-Based Mealy FSMs. Appl. Sci. 2022, 12, 8065. https://doi.org/10.3390/app12168065

AMA Style

Barkalov A, Titarenko L, Krzywicki K. Improving Hardware in LUT-Based Mealy FSMs. Applied Sciences. 2022; 12(16):8065. https://doi.org/10.3390/app12168065

Chicago/Turabian Style

Barkalov, Alexander, Larysa Titarenko, and Kazimierz Krzywicki. 2022. "Improving Hardware in LUT-Based Mealy FSMs" Applied Sciences 12, no. 16: 8065. https://doi.org/10.3390/app12168065

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop