Next Article in Journal
Channel Pruning-Based YOLOv7 Deep Learning Algorithm for Identifying Trolley Codes
Previous Article in Journal
Real-Time 3D Reconstruction Pipeline for Room-Scale, Immersive, Medical Teleconsultation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improving Characteristics of FPGA-Based FSMs Representing Sequential Blocks of Cyber-Physical Systems

by
Alexander Barkalov
1,2,*,
Larysa Titarenko
1,3,
Kazimierz Krzywicki
4,* and
Svetlana Saburova
3
1
Institute of Metrology, Electronics and Computer Science, University of Zielona Gora, ul. Licealna 9, 65-417 Zielona Gora, Poland
2
Department of Computer Science and Information Technology, Vasyl Stus’ Donetsk National University (in Vinnytsia), 600-Richya Street 21, 21021 Vinnytsia, Ukraine
3
Department of Infocommunication Engineering, Faculty of Infocommunications, Kharkiv National University of Radio Electronics, Nauky Avenue 14, 61166 Kharkiv, Ukraine
4
Department of Technology, The Jacob of Paradies University, ul. Teatralna 25, 66-400 Gorzow Wielkopolski, Poland
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2023, 13(18), 10200; https://doi.org/10.3390/app131810200
Submission received: 10 August 2023 / Revised: 5 September 2023 / Accepted: 8 September 2023 / Published: 11 September 2023
(This article belongs to the Section Electrical, Electronics and Communications Engineering)

Abstract

:
This work proposes a method for hardware reduction in circuits of Mealy finite state machines (FSMs). The circuits are implemented as networks of interconnected look-up table (LUT) elements. The FSMs with twofold state assignment and encoding of output collections are discussed. The method is based on using two LUT-based cores to implement systems of partial Boolean functions. One of the cores uses only maximum binary codes, while the second core is based on the use of extended state codes. The hardware reduction is based on diminishing the number of transformed maximum binary codes. This leads to FPGA-based FSM circuits with three levels of logic blocks. Each logic block has a single level of LUTs. As a result, partial functions are represented by single-LUT circuits. The article shows a step-by-step procedure for the transition from the initial form of the FSM representation to its logical circuit (a network of programmable look-up table elements, flip-flops, and interconnects). The results of experiments conducted with standard benchmarks show that the proposed approach produces LUT-based FSM circuits with significantly better area characteristics than for circuits produced by such methods as Auto and One-Hot of Vivado, JEDI, and twofold state assignment. Compared to these methods, the number of LUTs is reduced from 9.44% to 69.98%. Additionally, the proposed method leads to the following phenomenon: the maximum operating frequency is slightly improved as compared with FSM circuits based on twofold state assignment (up to 0.6%). The negative effect of these improvements is an increase in power consumption. However, it is extremely insignificant (up to 1.56%). As the values of the FSM’s main characteristics grow, there is an increase in the gain from the application of the proposed method. The conditions for applying the proposed method are determined. A generalized architecture consisting of three blocks of partial functions and a method for synthesizing an FSM with this architecture are proposed. A method for selecting one of the seven architectures generated by the generalized architecture is proposed.

1. Introduction

Our world is characterized by the widespread distribution of various cyber-physical systems (CPSs) into all spheres of human activity [1,2,3]. Currently, intensive research is being carried out in the field of designing and ensuring the safety of the operation of CPSs [4,5,6,7,8,9]. As the name suggests, these systems include digital (cybernetic) parts interacting with physical objects [10,11,12]. Very often, these digital parts include various sequential blocks [3,11]. These blocks can implement, for example, various security algorithms [13]. To improve the overall quality of a cybernetic part, it is necessary to optimize characteristics of its sequential blocks. In the current paper, we discuss a case where the sequential blocks of digital parts are represented by finite state machines (FSMs) [14].
Very often, the models of Mealy FSMs are used for the specification of sequential blocks [14,15]. The process of FSM design requires balancing the values of the occupied chip area, the maximum operating frequency, and power consumption [16,17]. We discuss a case where FSM circuits are designed with field-programmable gate arrays (FPGAs). The look-up table (LUT) elements are the basic elements used for implementing FSM circuits. As follows from [18,19], the circuit area has the greatest influence on the values of other characteristics. The area can be reduced due to jointly applying various methods of structural decomposition. In our paper [20], we propose an optimization method based on jointly applying the methods of twofold state assignment (TSA) and encoding of output collections. As a result, LUT-based FSM circuits have exactly three logic levels. Let us point out that FPGAs are very popular in modern digital systems design [5,7,8].
In this paper, we focus our attention on FPGA chips produced by AMD Xilinx [21] because this corporation is the largest manufacturer of FPGA chips. To implement an FSM circuit, we use configurable logic blocks (CLBs) that include four main components: LUTs, programmable flip-flops, dedicated multiplexers, and fast interconnections. To obtain a multi-CLB circuit, the system of inter-CLB programmable interconnects should be used. The proposed method reduces the values of LUT counts in the multi-level circuits of Mealy FSMs.
The main principle of TSA-based FSMs assumes using two types of internal state codes [20]. Each state is represented by both a maximum binary state code (MBC) and an extended state code (ESC) [20]. Such an approach allows for reducing FSM hardware compared to methods based solely on MBCs. However, the approach in [20] is connected with some overhead. Namely, an additional state transformer block should convert MBCs into ESCs. This converter consumes additional LUTs and interconnections. In this paper, we show how to reduce the noted overhead.
The main contribution of this paper boils down to the following. We have proposed: (1) a novel design method aimed at reducing the LUT counts in the circuits of FPGA-based Mealy FSMs with twofold state assignment and encoding of output collections; (2) a generalized FSM architecture, including three blocks of partial Boolean functions (PBFs); (3) a method of choosing one of seven possible FSM architectures based on the generalized architecture. To reduce hardware, we propose to use at least two cores of logic [22]. The first core generates PBFs based on MBCs. The second core uses ESCs for this purpose. This approach allows for reducing hardware in the state transformer circuit because now only a part of the MBCs is transformed into ESCs. The scientific novelty of the proposed approach also includes an improvement in the known method of encoding of output collections by some additional variables. This encoding is done so that each of the cores includes some additional variables that do not occur in the second core. Thanks to this approach, the number of LUTs generating additional variables is reduced. Our current research shows that joint usage of these two approaches leads to FSM circuits having fewer LUTs compared to FSM circuits based on the approach in [20]. The experimental results show that the proposed approach does not lead to significant deterioration of FSM temporal characteristics.
The remainder of the article is organized as follows. Section 2 presents the background of FPGA-based Mealy FSM design. Section 3 includes an analysis of relevant works. Section 4 is devoted to representing a main idea of the proposed method. An example of FSM synthesis is discussed in Section 5. The conducted experiments are analyzed in Section 6. A generalized FSM architecture is discussed in Section 7. Finally, Section 8 is a short conclusion that summarizes the results.

2. Background Information for FPGA-Based Mealy FSMs

A Mealy FSM has M internal states, L external inputs, and N outputs used by other blocks of a CPS. To organize interstate transitions, special internal objects are used. These include R1 state variables and R1 input memory functions (IMFs). These objects are combined into corresponding sets S, I, O, SV, and D [14], which represent the following: S = { s 1 , , s M } , I = { i 1 , , i L } , O = { o 1 , , o N } , S V = { T 1 , , T R 1 } , and D = { D 1 , , D R 1 } . The sets S, I, and O uniquely follow from, for example, the FSM state transition graph (STG) [23]. However, the value of the parameter R1 is chosen by the circuit designer during the state assignment stage [23].
In the case of MBCs [24], the following formula determines the value of R1:
R 1 = log 2 M .
Formula (1) determines the number of bits for MBCs (this is the minimum possible number for the given number of states). In the case of one-hot state assignment [24], the value of R1 is equal to the number of states ( R 1 = M ).
The state variables T r S V create so-called full state codes F C ( s m ) . Each state code bit corresponds to a flip-flop from a register RG. The register is controlled by IMFs and two special pulses, Res and Clk [25]. The pulse Res executes the initialization of the FSM operation. This pulse sets an FSM in the initial state s 1 S . The pulse Clk determines the instant of state code loading into RG. The r-th bit of F C ( s m ) is determined by the value of D r D . Like the vast majority of researchers, we use D flip-flops to organize the register RG [26].
The following internal resources of FPGA fabric are involved in implementing an FSM circuit: LUTs, flip-flops, programmable interconnections, a synchronization tree, and programmable input–outputs [27,28]. In this paper, we consider a case where FPGAs of AMD Xilinx [25] are used.
A LUT is a functional generator having S L inputs and a single output [24,29]. A LUT may keep a truth table of any Boolean function it depends on up to S L Boolean arguments. Nowadays, the value of S L does not exceed 6. However, using dedicated multiplexers, the number of inputs can be increased to 8 (within a single CLB) [27]. If the number of Boolean arguments exceeds 8, then a corresponding function is represented by a multi-CLB circuit. This leads to the necessity of minimizing the number of LUTs and their levels in the resulting circuit [30,31]. In this paper, we denote by the symbol LUTer a block consisting of LUTs, multiplexers, flip-flops, and interconnections. All these elements are programmable [32].
Two systems of Boolean functions (SBFs) represent an FSM logic circuit. They are the following [17]:
D = D ( S V , I ) ;
O = O ( S V , I ) .
These SBFs define a so-called P Mealy FSM whose architecture is shown in Figure 1 [14].
In Figure 1, the block LUTerSV implements IMFs (2). The IMFs determine the next state code (a code of the state of transition). The flip-flops of register RG are distributed among the elements of LUTerSV. The pulses Clk and Res control the operation of flip-flops. The block LUTerOF generates output functions (3).
The analysis of SBFs (2) and (3) shows that their functions depend on variables T r S V and i l I . Each function f b D Y depends on R b R 1 state variables and L b L inputs. The number of LUT levels in the corresponding circuit depends on the following condition:
R b + L b S L .
If (4) holds, then there is a single LUT in the corresponding logic circuit. The FSM circuit is single-level if condition (4) holds for each function belonging to SBFs (2) and (3). In this case, the resulting FSM circuit is characterized by the best possible values of its main characteristics. This means that this circuit requires the minimum possible chip area, that it consumes the minimum possible power, and that it represents the fastest possible solution.
Even average FSMs can have up to 10 state variables and 30 inputs [14]. Therefore, each function belonging to (2) and (3) may have up to 40 arguments. However, the number of LUT inputs is extremely small ( S L = 6 ). In this regard, the probability of violation of the condition (4) is very high. In the case of violation, various optimization methods are used to improve the characteristics of an FSM circuit. In this paper, we discuss a case where condition (4) is violated.

3. Analysis of Related Work

Methods for improving spatial characteristics of FSM circuits are discussed in thousands of scientific works. For example, they can be found in [18,19,25,30,32,33,34,35,36,37,38]. To estimate the chip area required for a LUT-based circuit, the designers use the values of LUT counts [18]. Therefore, reducing the value of LUT count leads to a decrease in the area occupied by the circuit. This goal can be achieved using: 1. an optimal state assignment; 2. a functional decomposition (FD) of SBFs (2) and (3); and 3. a structural decomposition (SD) of the FSM logic circuit [19].
The optimal state assignment excludes some literals from sum-of-products (SOPs) of functions (2) and (3) [39]. In the best case, this exclusion allows for implementing a single-level Mealy FSM circuit. One of best state assignment methods is JEDI, which is distributed together with the CAD tool SIS [40]. In the work of [41], results of applying JEDI to FSMs from the library LGSynth93 are shown [42]. These results show that JEDI allows for excluding up to 3 literals from SOPs (2) and (3), representing the benchmark FSMs. Therefore, using JEDI can turn multi-level circuits into single-level ones only for rather simple FSMs [32].
Using either FD or SD leads to representing SBFs (2) and (3) by systems of partial Boolean functions [34,43]. Each PBF should depend on no more than S L arguments. In this case, each PBF will be represented by a single-LUT circuit. Applying any type of decomposition produces multi-level FSM circuits. However, there is a fundamental difference in the resulting interconnection system for different decomposition methods [19]. Applying the functional decomposition leads to FSM circuits with a “spaghetti-type” irregular interconnect system. In such a system, the same inputs and state variables may appear at any place on the circuit. Let us point out that the system of interconnections has a regular character for SD-based FSM circuits. An SD-based FSM circuit consists of large blocks [19]. Each block has its unique systems of input variables and output functions, which can differ from FSM inputs i l I and state variables T r S V . Due to this, SD-based circuits have better quality than the equivalent FD-based circuits [19].
One such method is the encoding of FSM output collections (OCs) [19]. A collection O q O is a set of outputs o n O that are generated simultaneously during the same interstate transition. If a particular STG has H interstate transitions, then the number of OCs, Q, differs from 1 to H [19].
To encode Q OCs by maximum binary codes K ( O q ) , R2 variables are enough:
R 2 = log 2 Q .
These variables create the set A V = { a 1 , , a R 2 } . There are two SBFs representing the system of FSM outputs [19]:
A V = A V ( S V , I ) ;
O = O ( A V ) .
Applying this approach turns P Mealy FSM into PY Mealy FSM (Figure 2).
In the LUT-based PY Mealy FSM, the block LUTerSV implements SBF (2). The block LUTerAV generates the additional variables represented by SBF (6). The block LUTerOF produces the FSM outputs represented by SBF (7).
As follows from the research [44], this approach allows for reducing the chip area necessary for generating FSM outputs compared to this parameter if the outputs are represented by SBF (3). However, this gain reduces the value of the maximum operating frequency compared to an equivalent P Mealy FSM. To optimize characteristics of PY Mealy FSMs, the encoding of OCs may be connected with a twofold state assignment [20], leading to P T Y Mealy FSMs. We will discuss them a bit further.
To execute the TSA, we should find a partition π S of the set S by K classes. Each class includes compatible states. States s m , s j S are compatible if their inclusion in the same class of the partition π S does not lead to the following phenomenon: the required number of LUT inputs exceeds the maximum number of inputs of LUT S L . Why such a phenomenon is possible will be clear from the further text of the article. Three sets characterize any class S k π S . These sets consist of: 1. inputs determining transitions from states s m S k (a set I k I including L k elements); 2. outputs produced during the transitions from these states (a set O k O ); and 3. IMFs determining MBCs of transition states (a set D k D ). If the encoding of OCs is used, then the set O k O is replaced by set A V k A V . The set A V k includes additional variables equal to 1 in the codes of OCs generated during the transitions from states s m S k .
Each class S k π S includes M k compatible states s m S . Inside each class, the states are encoded by partial codes P C ( s m ) . These codes have R k bits:
R k = log 2 ( M k + 1 ) .
To create the partial codes, a set ASV of additional state variables is created. The states s m S k are encoded using the variables v r A S V k . The sets A S V k create the set ASV, which includes R3 elements:
R 3 = R 1 + . . . + R K .
If a state s m S is compatible with states s p S k , then including this state into S k satisfies the condition:
R k + L k S L ( k { 1 , , K } ) .
This approach leads to a P T Y Mealy FSM. In P T Y Mealy FSMs, each state s m S has two codes. One of them is a maximum binary full state code F C ( s m ) , and the second is a partial state code P C ( a m ) . The second code determines a particular state as an element of a particular class.
Each class S k π S determines the following two systems of PBFs:
D k = D k ( A S V k , I k ) ;
A V k = A V k ( A S V k , I k ) .
To obtain the final values of additional variables and IMFs, the following SBFs should be created:
D r = k = 1 K D r k ( r { 1 , , R 1 } ) ;
A V r = k = 1 K A V r k ( r { 1 , . . . , R 2 } ) .
Next, the codes of the OCs should be transformed into FSM outputs. The outputs are represented by SBF (7). Additionally, the full state codes should be transformed into the corresponding partial codes. The transformation is represented by the following SBF:
A S V 1 = A S V 1 ( S V ) .
SBFs (11) and (12) define the first level of a P T Y Mealy FSM circuit. SBFs (13) and (14) determine its second level. Finally, SBFs (7) and (15) represent the third circuit level. The architecture of a P T Y Mealy FSM is shown in Figure 3.
In this architecture, the block LUTerk generates PBFs (11) and (12). The block LUTerPF implements the system of disjunctions (13) and (14). This block includes the distributed RG controlled by the pulses Clk and Res. The block LUTerOF implements the outputs represented by SBF (7). The block LUTerASV implements SBF (15). Therefore, it executes the transformation of state codes.
Our previous research [20] shows that the LUT-based circuits of P T Y FSMs have better characteristics than the circuits of equivalent PY FSMs. If the conditions
K S L ,
R 2 S L
hold, then the circuits of P T Y FSMs are three-level and are faster than the equivalent PY Mealy FSMs.
Let us represent the circuit (Figure 3) as a combination of a core of partial functions (CorePF) and a functional transformer. The core includes blocks LUTer1LUTerK. The functional transformer includes all other blocks shown in Figure 3. This leads to the generalized diagram of a P T Y FSM (Figure 4).
Analysis of the generalized diagram shows the following peculiarity: the transformation of full codes into partial codes P C ( s m ) is executed for all FSM states. However, there is a case when there is no need in the code transformation. If, for some state s m S , condition (4) holds, then, for this state, all PBFs are represented by single-LUT circuits. If we take into account this property, we can reduce the cardinality number of the partition π S . Additionally, the number of state variables R3 can be reduced as compared to its value for the equivalent P T Y FSM. In this paper, we propose a method based on taking into account the mentioned property.

4. Analysis of Our Current Approach

The transitions from state s m S are determined by elements of a set I ( s m ) I . There are L ( s m ) L elements in the set I ( s m ) I . If the condition
L ( s m ) + R 1 S L
holds, then it is enough for a single LUT to represent a circuit for any PBF generated during the transitions from s m S . Therefore, for such states, it makes sense to use the full state codes for generating PBFs. If the condition (18) is violated, then the corresponding codes F C ( s m ) should be transformed into partial codes. This allows for creating a class of states S 0 S whose maximum binary codes do not require the transformation. Therefore, the partition based on (10) should be constructed only for the states s m S 0 .
Based on the above-mentioned statement, we propose to use the ideas from our paper [22]. First of all, we should divide the set S by disjoint sets S 0 and S 1 = S \ S 0 . If a state s m S satisfies condition (18), then this state is included in the set S 0 . The states s m S 0 create a block CoreFC. Otherwise, the state s m S belongs to the set S1. The states s m S 1 form a block CorePC. Obviously, only the codes of states s m S 1 should be transformed.
CoreFC determines the sets I 1 I , A V 1 A V 0 A V , and D 0 D . The input i l I 1 causes the transitions from states creating the CoreFC. The set AV1 consists of additional variables a r A V produced only during the transitions from states creating the CoreFC. The set A V 0 consists of the additional variables produced by both FSM cores. The set D 0 includes functions D r D produced during the transitions creating the CoreFC. Therefore, the circuit of CoreFC is determined by the following SBFs:
D 0 = D 0 ( S V , I 1 ) ;
A V 1 = A V 1 ( S V , I 1 ) ;
A V 0 = A V 0 ( S V , I 1 ) .
To synthesize CorePC, it is necessary to create the partition π S 2 = { S 1 , , S J } of the set S1. This can be done using the same approach as the one creating π S . CorePC determines the sets I 2 I and A V 2 A V . Their purpose is clear from the previous analysis.
Three sets ( I P C j , A V P C j , D P C j ) are determined by each class S j of the partition π S 2 . Their meaning follows from the previous text. The state variables from the set ASV2 encode the states s m S 1 . The codes of states s m S j are created from elements of the set A S V j A S V 2 . There are R4 elements in the set A S V 2 ( R 4 = R 1 + R 2 + . . . + R J ) . The following SBFs determine the circuit of CorePC:
D P C j = D P C j ( A S V 2 j , I P C j ) ;
A V P C j = A V P C j ( A S V 2 j , I P C j ) .
To generate the final values of additional variables, FSM outputs, and state variables, we should use the functional transformer. This block is similar to the one used in the P T Y FSM (Figure 3). Using this information, we propose to transform P T Y FSMs into P 2 T Y Mealy FSMs (Figure 5).
In the proposed two-core FSM, the block CoreFC implements SBFs (19)–(21). The block CorePC implements SBFs (22) and (23). The block LUTerFA is a functional assembler implementing the following disjunctions:
D r = j = 1 J D r j ( r { 1 , . . . , R 1 } ) ;
A V 2 r = j = 1 J A V r j ( r { 1 , . . . , R 2 } ) .
The block LUTerFA includes a distributed full state code register whose informational inputs are connected with IMFs (24). The register is controlled by pulses Clk and Res. The block LUTerOF implements SBF (7) where A V = A V 1 A V 2 . The block LUTerASV2 implements SBF:
A S V 2 = A S V 2 ( S V ) .
Let us analyze the proposed solution. The partition π S 2 has J classes. Obviously, the following conditions take place:
J K ;
R 4 R 3 .
Due to the validity of condition (27), we can state that the circuit of the P 2 T Y Mealy FSM (Figure 5) is not slower than the circuit of the equivalent FSM P T Y (Figure 3). Due to the validity of condition (28), we can state that the circuit of CorePC for FSM P 2 T Y should perform better LUT counts than it does for block CorePF of the equivalent FSM P T Y . The same is true for block LUTerASV of the equivalent FSMs P T Y and P 2 T Y . Therefore, we could expect that a circuit of P 2 T Y FSM (Figure 5) requires a smaller area and is not slower compared to a circuit of equivalent P T Y FSM (Figure 3). These assumptions of ours have been confirmed by the conducted studies, the results of which are given in Section 6.
Let us show the features of our method in comparison with the methods proposed in [20,22]. In the article [20], we discussed P T Y FSMs with two-fold state assignment and encoding of output collections. The P 2 T Y FSMs have the following differences. First, in P T Y FSMs, the codes of all states are converted, while in P 2 T Y FSMs, only a part of the code is converted. This allows for optimizing the code converter circuit (compared to the circuit used in equivalent P T Y FSMs). Secondly, the use of two cores allows us to encode OCs such that some variables a r A V are generated only by the LUTs of CoreFC. This allows for reducing the number of LUTs generating output signals (compared to this number for equivalent P T Y FSMs). In the article [22], we discussed so-called P 2 C FSMs, where two cores of LUTs are used. However, P 2 C FSMs are based on one-hot encoding of outputs. In P 2 T Y FSMs, we use maximum binary codes of output collections. This allows for reducing the number of LUTs generating output signals (compared to this number for equivalent P 2 C FSMs).
In this paper, we propose a synthesis method aimed at LUT-based P 2 T Y Mealy FSMs. The synthesis process starts from the FSM state transition graphs [17]. Next, these graphs are transformed into equivalent state transition tables (STTs) [17]. The sequence of steps of the proposed method is the following:
  • Creating an STT of P Mealy FSM.
  • Pre-formation of sets S 0 and S1.
  • Pre-formation of partition π S 2 of set S1.
  • Final formation of sets S 0 and S1 and partition π S 2 .
  • Creating full state codes F C ( s m ) .
  • Encoding of output collections O q O and finding SBF (7).
  • Creating a table of CoreFC and deriving SBFs (19)–(21).
  • Encoding of states s m S j by partial state codes P C ( s m ) .
  • Generating tables describing the blocks of CorePC and deriving systems (22) and (23).
  • Creating a table of LUTerFA and SBFs (24) and (25).
  • Creating a table of LUTerASV and systems (26).
  • Creating the P 2 T Y Mealy FSM circuit.
To show that the model of P 2 T Y FSM is used to synthesize FSM A, we use the symbol P 2 T Y ( A ) . Let us explain how to execute the steps of the proposed design method.

5. Synthesis Example

We discuss a synthesis example for Mealy FSM A1 (Figure 6). To implement the FSM circuit, we use LUTs with S L = 5 .
The FSM states correspond to the STG vertices [17]. To show interstate transitions, the vertices are connected by arcs. An STG includes H arcs. The h-th arc ( h { 1 , . . . , H } ) is marked by a pair < I h , O h > . In this pair, the symbol I h stands for a conjunction of either FSM inputs i l I or their complements. This is an input signal. The set O h O includes FSM outputs o n O generated during the transition number h.
The STG (Figure 6) determines the following sets: S = { s 1 , . . . , s 9 } , I = { i 1 , . . . , i 7 } , and O = { o 1 , . . . , o 7 } . Therefore, the FSM A1 is characterized by M = 9 , L = N = 7 . There are 22 arcs in the initial STG. This gives 22 transitions among the states of FSM A1.
Step 1. This step is omitted if an FSM is represented by STT. The transformation is executed in the following way [14]. The STT includes H lines. Each line corresponds to an STG arc. Each transition is characterized by its current state s C , the next state s T , inputs I C T (for the h-th arc, this is the signal I h ), outputs O C T (for the h-th arc, this is the OC O h ), and h. Therefore, each arc determines the columns s C , s T , I C T , O C T , and h. Table 1 is an STT of A1.
This table uniquely corresponds to the STG (Figure 6). We add the column q into Table 1 to show the subscripts of output collections.
Step 2. The following values of L ( s m ) can be found from the analysis of Table 1: L ( s m ) = 1 for states s 1 , s 3 , s 4 , s 6 , s 7 ; L ( s m ) = 2 for states s 2 , s 5 , s 8 , s 9 . Additionally, M = 9 . Using (1) gives R 1 = 4 . As follows from the initial conditions of the example, S L = 5 . Therefore, condition (18) takes place for states with L ( s m ) = 1 . Thus, the following sets can be created: S 0 = { s 1 , s 3 , s 4 , s 6 , s 7 } and S 2 = { s 2 , s 5 , s 8 , s 9 } . As follows from our analysis, some states may be transferred from S 0 to S2. Thus, the elements of these sets can be changed. From Table 1, we can find the sets I 1 = { i 1 , i 2 , i 3 } and I 2 = { i 2 , i 3 , i 5 , i 6 , i 7 } .
Step 3. Using known approach [20], we can find the partition π S 2 = { S 1 , S 2 } of the set S1. It includes the classes S 1 = { s 2 , s 5 } and S 2 = { s 8 , s 9 } . Because the set S1 is a preliminary one, this partition is also preliminary. Each class includes M j = 2 elements. Using (8) gives the following relation: R 1 = R 2 = 2 . Using (9) gives R 3 = 4 . Therefore, there is a set of state variables A S V 2 = { v 1 , . . . , v 4 } .
Step 4. The classes S j π S 2 determine the following sets of inputs: I 1 = { i 2 , i 5 , i 6 } and I 2 = { i 3 , i 5 , i 7 } . Therefore, we have L 1 = L 2 = 3 . This means we cannot add new inputs in these sets due to violation of condition (10). Each set S j π S 2 can include up to 3 elements without violation of (10). Therefore, one additional state can be added to each of the sets S j π S 2 .
The method of state redistribution is discussed in detail in the paper [22]. In our current paper, we just show the result of redistribution, which is the following: S 0 = { s 1 , s 3 , s 4 } and S 1 = { s 2 , s 5 , s 6 , s 7 , s 8 , s 9 } . The redistribution gives the following classes: S 1 = { s 2 , s 5 , s 7 } and S 2 = { s 6 , s 8 , s 9 } . Now, we obtain M 1 = M 2 = 3 . Using these values and Formula (8), we can see that R 1 = R 2 = 2 and R 4 = 4 . Therefore, the total number of state variables v r A S V 2 does not change, but now the set S 0 includes fewer elements. Now we can expect a decrease in the value of the LUT count for the circuit of CoreFC.
Step 5. There are M = 9 elements in the set S. Therefore, using (1) gives R 1 = 4 . This value determines the sets S V = { T 1 , . . . , T 4 } and D = { D 1 , . . . , D 4 } . As shown in [17], it is necessary to cover the states from the same class using the minimum possible number of generalized cubes of R1-dimensional Boolean space. Such an outcome decreases the number of literals in functions (19)–(21). One of the possible outcomes is shown in Figure 6. To encode the states by MBCs, we used the algorithm JEDI [40].
As we can see from the analysis of the resulting Karnaugh map (Figure 7), the states s m S 0 are covered by the generalized cube 00xx. The states s m S 1 are represented by the generalized cube x100. The cube 1x00 covers the states s m S 2 . Therefore, for our example, each class is placed into a single generalized cube.
Step 6. The analysis of Table 1 gives Q = 10 output collections. They are the following: O 1 = , O 2 = { o 1 , o 7 } , O 3 = { o 4 } , O 4 = { o 3 } , O 5 = { o 2 , o 6 } , O 6 = { o 1 , o 4 } , O 7 = { o 5 } , O 8 = { o 2 } , O 9 = { o 1 , o 5 , o 7 } , and O 10 = { o 4 , o 5 , o 6 } . Using (5) gives R 2 = 4 and the set A V = { a 1 , . . . , a 4 } .
Each literal in the sum-of-product (SOP) of a Boolean function corresponds to an interconnection between the input source and a corresponding LUT. To reduce the number of interconnections, the number of literals in SOPs should be decreased. To encode the output collections, we used the methods presented in classical work [17]. Using the approach from [17] gives the codes shown in Figure 8.
We encoded the OCs in a way where the variable a 1 A V is generated only by one LUT of CoreFC. To do this, we have analyzed Table 1. The analysis of Table 1 shows that the following OCs are generated during the transitions from states s m S 0 : O 4 , O 5 , O 8 , and O 10 . Therefore, we have divided the Karnaugh map (Figure 8) into two parts. The first part corresponds to a 1 = 0 , and the second part corresponds to a 1 = 1 . We have placed the OCs O 4 , O 5 , O 8 , and O 10 into the second part. Now, we can obtain the following system of functions:
o 1 = O 2 O 6 O 9 = a 2 ; o 2 = O 5 O 8 = a 1 a 3 ¯ ; o 3 = O 4 O 10 = a 1 a 3 ; o 4 = O 3 O 6 = a 1 ¯ a 4 ; o 5 = O 7 O 9 O 10 = a 3 a 4 ¯ ; o 6 = O 5 O 10 = a 1 a 4 ¯ ; o 7 = O 2 O 9 = a 2 a 4 ¯ .
The SBF (29) determines the circuit of LUTerOF. The function o 1 is represented by a corresponding output of LUTerFA. Therefore, the circuit of LUTerOF consists of 6 LUTs. Analysis of system (29) shows that there are 12 literals in the SOPs of the implemented functions. This determines 12 interconnections between LUTerOF and other circuit blocks. Using the results of [19] gives the maximum number of interconnections. In our case, it is equal to R 2 Q = 28 . Thus, due to using the proposed approach, the number of interconnects is reduced by 2.33 times.
Step 7. To construct the table of CoreFC, it is necessary to select the lines of STT with transitions from states s m S 0 . In the discussed case, we should select lines 1–2 and 6–9 of STT (Table 1). The table of CoreFC includes 5 additional columns (compared to the baseline STT). These columns are: F C ( s C ) , F C ( s T ) , D h 0 , A V h 0 , and A V 1 h . There is a self-explanatory meaning of columns F C ( s C ) and F C ( s T ) . The column D h 0 includes IMFs creating the code F C ( s T ) (to load it into the code register). The column A V h 0 includes the additional variables a r A V equal to 1 in codes of generated OCs. These variables are also produced by some blocks of LUTerPC. The column A V 1 h includes the additional variables a r A V generated only by the block CoreFC. Obviously, these variables are not produced by any block of LUTerPC. Table 2 represents the block CoreFC for the given example.
The columns A V h 0 and A V 1 h are created in the following manner. For example, there is an OC O 4 written in line 1 of Table 1. Analysis of Figure 8 gives the code K ( O 4 ) = 1011 . This code determines the variables a 1 , a 3 , and a 4 . Therefore, the first line of Table 2 includes the variables a 3 and a 4 in the column A V h 0 , as well as the variable a 1 in the column A V 1 h . All other lines of Table 2 are created using a similar approach.
Using Table 2, we can obtain SBFs (19)–(21). For example, the function a 1 A V is represented as the following:
a 1 = T 1 ¯ T 2 ¯ T 3 ¯ T 1 ¯ T 2 ¯ T 3 i 1 .
The block CoreFC determines the set D 0 = D . We will show a bit later the SOPs for functions D 1 0 and a 3 0 .
Step 8. The codes for states s m S 1 use the variables v 1 , v 2 S V . The codes for states s m S 2 are based on the variables v 3 , v 4 S V . The code combination v 1 = v 2 = 0 indicates that a particular state belongs to a class other than S 1 . The code combination v 3 = v 4 = 0 indicates that a particular state belongs to a class other than S 2 . Due to the fulfillment of condition (10), the codes do not affect the number of LUTs in the circuit of CorePC. Therefore, the partial state codes can be arbitrary. We have chosen the following approach: the smaller the subscript (m) of a state, the more nulls its partial code contains. The obtained partial state codes are shown in Figure 9.
Using Figure 9, we can obtain the following partial codes: P C ( s 2 ) = P C ( s 6 ) = 01 , P C ( s 5 ) = P C ( s 8 ) = 10 , and P C ( s 7 ) = P C ( s 9 ) = 11 . Using them allows for creating tables representing CorePC.
Step 9. The block CorePC includes two blocks of LUTs. The block C o r e P C ( S 1 ) corresponds to the set S 1 , whereas the block C o r e P C ( S 2 ) corresponds to the set S 2 . The table of C o r e P C ( S 1 ) (Table 3) is based on lines 3–5, 10–12, and 15–16 (Table 1). Table 4 represents the block C o r e P C ( S 2 ) . The table is constructed using the lines 13–14 and 17–22 of the initial STT.
In these tables, the current states are represented by their partial codes P C ( s C ) ; the states of transition are represented by their full codes F C ( s T ) . The column O C T of STT is replaced by the columns A V h 1 and A V h 2 , respectively. These columns include additional variables equal to 1 in the codes of the OCs.
The transparent approach is used to construct SBFs (22) and (23). For example, the functions D 1 0 , D 1 1 , and D 1 2 are represented as:
D 1 0 = T 1 ¯ T 2 ¯ T 4 i 1 ; D 1 1 = v 1 ¯ v 2 i 2 ¯ i 5 v 1 v 2 i 2 ¯ ; D 1 2 = v 3 v 4 ¯ i 3 ¯ i 7 ¯ v 3 v 4 i 5 ¯ i 7 ¯ .
In the same way, we can obtain the following SOPs:
a 3 0 = T 1 ¯ T 2 ¯ T 3 ¯ T 4 ¯ i 1 T 1 ¯ T 2 ¯ T 3 ¯ T 4 ¯ i 1 ¯ ; a 3 1 = v 1 v 2 ¯ i 5 ¯ v 1 v 2 i 2 ; a 3 2 = v 3 v 4 ¯ i 3 v 3 v 4 ¯ i 3 ¯ i 7 ¯ v 3 v 4 i 5 ¯ i 7 ¯ .
Step 10. The block LUTerFA is based on Table 5. Table 5 includes the following columns: Function (this is an assembled function produced by LUTerFA), CoreFC, and CorePC. If some function belonging to the set D A V is generated by a LUT of the block CoreFC, then there is a 1 in the intersection of the row containing this function and the column CoreFC. The opposite situation is marked by 0. The column CorePC is divided by J subcolumns corresponding to the classes S j ( j { 1 , , J } ) . The same principle is in play for placing either 1 or 0 in the rows of this part of Table 5.
We use Table 2 to fill the rows of column CoreFC of Table 5. To fill the rows of subcolumn S 1 ( S 2 ) , we use Table 3 (Table 4).
Table 5 determines the R1 + R2 disjunctions of partial Boolean functions. The following disjunctions represent the circuit of the block LUTerFA:
D 1 = D 1 0 D 1 1 D 1 2 ; D 2 = D 2 0 D 2 1 D 2 2 ; D 3 = D 3 0 D 3 1 D 3 2 ; D 4 = D 4 0 D 4 1 D 4 2 ; a 1 = a 1 0 ; a 2 = a 2 1 a 2 2 ; a 3 = a 3 0 a 3 1 a 3 2 ; a 4 = a 4 0 a 4 1 a 4 2 .
Step 11. The block LUTerASV transforms the full codes F C ( s m ) into the partial state codes P C ( s m ) . This transformation is not executed for the states s m S 0 . The table of LUTerASV includes the following columns: s m , F C ( s m ) , P C ( s m ) , and A V m . The last column includes the symbols of additional variables equal to 1 in the codes P C ( s m ) . In the discussed case, the full state codes are taken from Figure 7; the partial state codes are taken from Figure 9. Using these codes, we can create Table 6.
Obviously, using Table 6 gives us the perfect SOPs [17] of SBF (12). To minimize the number of interconnections between the blocks LUTerFA and LUTerASV, we transform Table 6 into a multi-functional Karnaugh map (Figure 10).
Figure 10 is based on Figure 7. This transformation is done in an obvious way. We have simply replaced the symbols of states from Figure 7 with symbols of corresponding additional variables. Additionally, the codes of states s m S 0 are “do not care” code combinations. Using Figure 10 gives us the following SBF:
v 1 = T 2 T 4 T 2 T 3 ; v 2 = T 2 T 4 ¯ ; v 3 = T 1 T 4 T 1 T 3 ; v 4 = T 1 T 4 ¯ .
There are 10 literals in SBF (34). If each function from (12) is represented by its perfect SOP, then these SOPs have R 1 R 4 = 16 literals. Therefore, using the multi-functional Karnaugh map allows for reducing the number of interconnections by 1.6 times. As shown in [31], the fewer interconnections a circuit has, the less power it consumes.
Step 12. During this step, various technology mapping procedures should be executed [45,46]. If the FPGA chip used is produced by AMD Xilinx, then their CAD tool Vivado [47] should be applied for implementing an FSM circuit. In the next section, we show some results based on using this CAD package to implement FSM circuits. Experiments allow us to compare the effectiveness of the proposed method in relation to some known methods.
At the end of this section, we will show how to estimate the hardware amount in the circuits of FSMs P 2 T Y ( A 1 ) and P T Y ( A 1 ) . We start from FSM P 2 T Y ( A 1 ) . To find the LUT counts for circuits of CoreFC, CorePC (the first logic level), and LUTerFA (the second logic level), it is necessary to analyze Table 5 (the table of LUTerFA). Each symbol “1” in this table corresponds to a LUT from the first logic level. In the table, there are 21 “1” symbols. Therefore, the first-level circuits consist of 21 LUTs. If a row of the table includes more than a single 1, then this row corresponds to a LUT from the second logic level. There are 7 LUTs in the circuit of LUTerFA. This can be found from Table 5. To find the LUT counts for blocks LUTerOF and LUTerASV creating the third logic level, we should analyze SOPs (29) and (34), respectively. If an SOP includes at least two literals, then it determines a LUT of the third logic level. As follows from (29), there are 6 such SOPs. The analysis of (34) shows that the system includes 4 such SOPs. Therefore, the third logic level includes 10 LUTs. Summing up the number of LUTs for different levels, we see that the circuit of FSM P 2 T Y ( A 1 ) includes 21 + 7 + 10 = 38 LUTs.
To estimate the number of LUTs in the circuit of FSM P T Y ( A 1 ) , it is necessary to find the compatibility classes for the set of states. Using the approach [20] gives the partition with K = 3 . There are the following relations between the classes of π S and π S 2 : S 1 = S 0 , S 2 = S 1 , and S 3 = S 2 . This means that the table of LUTerPF (FSM P T Y ( A 1 ) ) is the same as Table 5. This gives 21 LUTs for the first logic level consisting of the blocks LUTer1LUTer3. Also, there are 7 LUTs in the circuit of LUTerPF. The blocks LUTerOF are the same for both FSMs (each of which includes 6 LUTs). However, there is R 3 = 6 . This gives 6 LUTs in the block LUTerASV. In total, 12 LUTs create the third logic level of FSM P T Y ( A 1 ) . Summing up the number of LUTs for different levels, we see that there are 21 + 7 + 12 = 40 LUTs in the circuit of P T Y ( A 1 ) .
Therefore, for such a simple FSM, we see a gain of 5.3% due to the transition from P T Y ( A 1 ) to P 2 T Y ( A 1 ) . For more complex FSMs, the gain can be much higher. This statement is confirmed by the results of the research shown in the next section.

6. Experimental Results

As a basis for comparing the efficiency of different synthesis methods, we use the benchmark FSMs from the library [42]. The library includes 48 benchmarks of varying complexity (numbers of states, inputs, outputs, output collections, and interstate transitions). The STTs of benchmark FSMs are represented using the format KISS2. These benchmarks have been used by different designers as a representative sample to compare the main characteristics of proposed and known FSM circuits [33,34,36]. To give an idea of the complexity of these benchmarks, we show their characteristics in [19,42].
As a rule, in research, FSMs are considered as stand-alone units. In this case, the stability of the output signals is not one of the main design problems. However, in our current paper, we consider Mealy FSMs as some parts of digital systems. As follows, for example, from [14], Mealy FSMs are unstable. This means that input fluctuations result in output fluctuations. The output fluctuations can cause operation failure in a digital system. Output stabilization can be achieved due to using a synchronous input register (AIR) [19]. The following is a principle of interaction of an FSM and other digital system blocks (Figure 11).
The system outputs are treated as FSM inputs forming the set I. As long as there are transients in the digital system, the synchronization signal Clk1 is equal to zero. This actually disconnects the FSM from other system blocks. When system outputs are stable, they are loaded into the AIR. Due to this, fluctuations in the system outputs do not affect the FSM output values. Of course, there is some overhead connected with this approach. Obviously, AIR consumes additional resources of the FPGA fabric. It also consumes some additional power and increases the value of FSM cycle time. Therefore, we took into account this overhead in our research.
In experiments, we use the Virtex-7 VC709 Evaluation Platform (xc7vx690tffg1761-2) [38]. Its FPGA chip xc7vx690tffg1761-2, produced by AMD Xilinx, is a base for implementing FSM circuits. For LUTs of this chip, there is S L = 6 . The step of technology mapping is executed by the CAD tool Vivado v2019.1 (64-bit) [47]. To create tables with experimental results, we use data from the reports produced by Vivado. The VHDL-based FSM models are used to connect the benchmarks with Vivado. We use the CAD tool K2F [10] to create VHDL codes corresponding to initial KISS2-based benchmark files.
From the Vivado reports, we have derived the following characteristics of P 2 T Y Mealy FSM circuits: the number of LUTs (LUT count), value of cycle time, maximum operating frequency, and power consumption. As a basis for comparison, we have chosen four different FSMs. They are the following: 1. P Mealy FSMs with MBCs produced by the Auto method of Vivado; 2. P Mealy FSMs with one-hot state codes produced by the One-Hot method of Vivado; 3. JEDI-based P Mealy FSMs; and 4. P T Y -based FSMs with twofold state assignment [20]. We did not compare P 2 T Y and PY Mealy FSMs. This is because P T Y FSMs have better characteristics than equivalent PY Mealy FSMs [20]. Therefore, if the proposed approach allows for improving characteristics compared to P T Y , then the results obtained will obviously be better than the results for equivalent PY Mealy FSMs.
As follows from [19,42], the values of LUT counts and other LUT-based FSM circuits’ characteristics strongly depend on the relation between the values of L + R 1 and S L . In the discussed case, there is S L = 6 . The benchmarks used have 5 complexity levels (C0–C4). These levels are determined in the following order. The benchmarks have the level C0 if R 1 + L 6 . The level C0 determines trivial FSMs. The benchmarks have the level C1 if 6 < R 1 + L 12 . The level C1 determines simple FSMs. The benchmarks have the level C2 if 12 < R 1 + L 18 . The level C2 determines average FSMs. The benchmarks have the level C3 18 < R 1 + L 24 . The level C3 determines big FSMs. The benchmarks have the level C4 if R 1 + L > 24 . The level C4 determines very big FSMs.
The results of experiments are shown in Table 7 (the LUT counts), Table 8 (the minimum cycle times), Table 9 (the maximum operating frequencies), and Table 10 (the consumed power). There is a similar organization for each of these tables. Benchmark names are in the table rows. The investigated methods are shown in the table columns. The complexity of a particular benchmark is shown in the last column. The row “Sum” includes results of summation for corresponding columns. In the row “Percentage”, we show the percentage of the summarized characteristics of various FSM circuits with respect to P 2 T Y -based FSMs.
From Table 7, we can find that, compared to other investigated methods, the circuits of P 2 T Y -based FSMs consume a minimum number of LUTs. The proposed approach provides the following gain: 1. 48.97% regarding the Auto-based FSMs; 2. 69.98% regarding the One-Hot-based FSMs; 3. 26.33% regarding the JEDI-based FSMs; and 4. 9.44% regarding the P T Y -based FSMs. In our opinion, this gain is associated with a decrease in the amount of transformed state codes compared to P T Y -based FSMs. Due to this, the LUT count in LUTerASV is less than 1 for the code transformer of equivalent P T Y -based FSMs. Additionally, the gain can be achieved due to reducing the cardinality number of the partition of states. The fulfillment of the condition ( J + 1 ) < K provides a decrease in the required number of LUT inputs for elements of LUTerFA compared to that of the LUTs of LUTerPF. This phenomenon can lead to a decrease in the LUT count.
The following phenomenon is clear from Table 7: if an FSM has the complexity C0, then there are the same LUT counts for equivalent FSMs based on collection encoding. Moreover, in this case, other FSMs have better values of LUT counts than P T Y - and P 2 T Y -based FSMs. We can explain this in the following way. If an FSM has the complexity C0, then the condition (4) takes place. In this case, each SOP (2) and (3) is implemented by a single LUT. Therefore, in this case, there is no need to use various structural decomposition methods. However, regardless of the validity of condition (4), the encoding of output collections is executed for both P T Y - and P 2 T Y -based FSMs. As a result of this, the block LUTerOF is used. This block consumes additional LUTs compared to other researched methods. Due to validity of (4), there are no partial functions for FSMs having the complexity C0. As a result, there is no need to assemble blocks (LUTerFA and LUTerPF). This means that both P T Y and P 2 T Y FSMs degenerate into equivalent PY FSMs.
Now, let us analyze the temporal characteristics of FSM circuits. They are represented in Table 8 (the cycle time measured in nanoseconds) and Table 9 (the maximum operating frequency measured in megahertz).
Analysis of Table 8 shows that JEDI-based FSMs are the fastest. It also shows that P T Y -based FSMs are marginally slower than circuits of P 2 T Y -based FSMs (the average loss is 0.56%). At the same time, the proposed approach generates circuits with worse time characteristics than the circuits of P FSMs. The Auto-based FSMs are 0.09% faster than the P 2 T Y -based FSMs. The One-Hot-based FSMs are 0.73% faster than the P 2 T Y -based FSMs. Finally, JEDI-based FSMs are 5.93% faster than P 2 T Y -based FSMs. If the FSM complexity exceeds C0, then both P T Y - and P 2 T Y -based FSMs have three-level circuits. At the same time, it is difficult to estimate a priori the number of logic levels in circuits of P FSMs. It all depends on the number of literals in the implemented sum-of-products.
As follows from Table 8, if FSM complexity is equal to C0, then cycle times are the same for equivalent P 2 T Y - and P T Y -based FSMs. This phenomenon takes place because, in this case, both P 2 T Y - and P T Y -based FSMs turn into PY FSMs. However, if we look at the most complex FSMs having the complexity C4, we will see that the proposed method allows for obtaining the fastest circuits. Thus, the performance of P 2 T Y FSMs becomes better and better as the synthesized FSMs become more complex.
As follows from Table 9, an average, the circuits of P 2 T Y FSMs are slower compared to circuits of P-based FSMs. Our approach loses 1.6% to Auto-based FSMs. It loses 1.43% to One-Hot–based FSMs. The JEDI-based FSMs have the greatest gain (6.54%). Only P T Y -based FSMs are a bit slower than P 2 T Y -based FSMs. Obviously, the reasons for the loss in frequency are the same as the reasons for the loss in cycle time. Additionally, analysis of Table 9 shows that, starting with complexity level C2, our method allows us to produce faster circuits compared to other methods under study.
It is known [48] that one of the most important characteristics of FSM circuits is their power consumption. In particular, it is important in the case of mobile and autonomous cyber-physical systems [49]. Very often, a designer should make the choice among the area-temporal characteristics and the power consumption of a particular device. The values of power consumption can be taken from the Vivado reports. The power consumption is measured for the maximum possible value of the operating frequency. We show the experimental results for power consumption in Table 10.
The proposed method reduces the numbers of LUTs in FSM circuits compared with this characteristic of equivalent P T Y -based FSMs. Very often, such improvement results in an increase in power consumption [19]. This phenomenon takes place for our method. However, as follows from comparison of P T Y - and P 2 T Y -based FSMs (Table 10), P T Y FSMs have a very small gain in power consumption. Compared to P T Y -based FSMs, the loss in power consumption averages 1.55%. Additionally, JEDI-based FSMs require less power than equivalent P 2 T Y FSMs. The proposed approach allows for obtaining FSM circuits with less power consumption than for both Auto-based FSMs (11.95% of gain) and One-Hot-based FSMs (19.29% of gain).
If FSMs have complexity C0, then both P T Y and P 2 T Y FSMs have equal values of power consumption. If the FSM complexity exceeds C0, then P T Y FSMs always require less power than equivalent P 2 T Y FSMs. We see the following reason for this situation. In P T Y FSMs, the state variables enter only block LUTerASV. In contrast to this, in P 2 T Y FSMs, the outputs of LUTerFA are connected with two blocks (LUTerASV and CoreFC). It is known [31] that interconnections consume up to 70% of power. Therefore, the more interconnections, the more power is consumed.
Let us sum up some results of the comparison of equivalent P T Y and P 2 T Y FSMs. If FSMs have complexity C0, then there are the same values of basic characteristics for both models. For other levels of complexity, P 2 T Y FSMs have better spatial characteristics (the required FPGA chip area) than they do in their single-core counterparts based on twofold state assignment. For rather simple FSMs, P T Y FSMs have better temporal characteristics. However, as the complexity increases, the cycle times (and maximum operating frequencies) of P 2 T Y FSMs gradually become better than in their single-core counterparts. The FSM circuits based on the proposed method always require more power. However, this loss is very small (it does not exceed 2% on average). This comparison leads to the following conclusion: P 2 T Y FSMs should be used instead of P T Y FSMs if the required chip area is the main optimality criterion of designed LUT-based circuits. This conclusion is supported by diagrams shown in Figure 12.
Under certain conditions, the proposed method can be applied to implement the LUT-based circuit of any sequential block. In this case, neither the algorithm for the functioning of this block nor the scope of the digital system in which this block operates is important. The possibility of applying the model of P 2 T Y FSM depends on the distribution of inputs i l I between the states s m S . If this distribution leads to the fulfillment of condition (4), then there is no need for optimization (because the circuit of P FSM has the best possible characteristics). If condition (4) is violated but the distribution leads to the fulfillment of condition (10), then the method can be applied. Otherwise, it is impossible to find a partition of the set of states for which each partial function is represented by a single-LUT circuit. The proposed method can be applied only if condition (18) is satisfied for some states s m S . In this case, the corresponding partial functions depending on the state variables T r S V are implemented using single-LUT circuits. The more states that satisfy condition (18), the greater the gain from applying our method compared to using P T Y FSMs. However, if condition (18) is satisfied for all states, then there is no point in applying either P 2 T Y or P T Y FSMs. In this case, both of these models degenerate into a PY FSM. Thus, it is advisable to use the proposed method only if condition (18) is satisfied for a number of states (but not for all M states), and condition (10) for the rest.

7. Generalized FSM Architecture

Unfortunately, there is a condition where the proposed method cannot be applied. For a given FSM, let the set of states include at least a single state s m S for which the following condition is satisfied:
L ( s m ) S L .
It is obvious that the state satisfying condition (35) cannot be included in either set S 0 or set S1. To obtain partial functions generated during transitions from this state, it is necessary to apply the methods of functional decomposition. Thus, to take into account the presence of such states, it is necessary to introduce a CoreFD based on functional decomposition into the architecture of P 2 T Y FSM shown in Figure 5.
We propose to split the set S by three disjoint sets ( S 0 , S 2 , S 3 ). The set S 0 S includes states satisfying condition (18). The set S 3 S includes states satisfying condition (35). The set S 2 S includes the rest of the states, i.e., S 2 = S / ( S 0 S 3 ) . The transitions from states s m S 3 are determined by FSM inputs creating the set I3. To encode these states, it is necessary to create the set of state variables ASV3. This set includes its own unique state variables. Three sets of PBFs are generated by LUTs of CoreFD: D F (IMFs generated during the transitions from states s m S 3 ); A V F (additional variables encoding the OCs generated during the transitions from states s m S 3 ); and AV3 (unique additional variables encoding the OCs generated during the transitions from states s m S 3 ). Therefore, the following partial SBFs are generated by LUTs of CoreFD:
D F = D F ( A S V 3 , I 3 ) ;
A V 3 = A V 3 ( A S V 3 , I 3 ) ;
A V F = A V F ( A S V 3 , I 3 ) .
We denoted as P F 2 T Y the proposed generalized architecture of the LUT-based FSM circuit. Here the letter “F” means the presence of the block CoreFD. The proposed generalized architecture is shown in Figure 13.
The generalized architecture (Figure 13) includes three cores of PBFs. CoreFC generates PBFs for states satisfying condition (18). CoreFD generates PBFs for states satisfying condition (35). CorePC generates PBFs for the rest of the states.
In the P F 2 T Y FSM, LUTerFA generates the full functions represented by the following systems of disjunctions:
D = D ( D 0 , D 1 , . . . , D J , D F ) ;
A V 2 = A V 2 ( A V 0 , A V 1 , . . . , A V J , A V F ) .
LUTerOF implements SBF (7). However, now the set AV is represented in the following form: A V = A V 1 A V 2 A V 3 . To encode the states of a P F 2 T Y FSM, the set A S V 0 is used, where A S V 0 = S V A S V 2 A S V 3 . Therefore, LUTerASV generates the SBF:
A S V 0 = A S V 0 ( S V ) .
Naturally, the proposed architecture is universal. In this paper, we propose the following method for synthesizing an FSM with a generalized architecture:
  • Creating an STT of a P Mealy FSM.
  • Pre-formation of sets S 0 , S2, and S3.
  • Pre-formation of partition π S 2 of set S2.
  • Final formation of sets S 0 and S2 and partition π S 2 .
  • Creating full state codes F C ( s m ) for states s m S .
  • Encoding of output collections O q O and finding SBF (7).
  • Creating a table of CoreFC and deriving SBFs (19)–(21).
  • Encoding of states s m S j by partial state codes P C ( s m ) .
  • Generating tables describing the blocks of CorePC and deriving systems (22) and (23).
  • Encoding of states s m S 3 by partial state codes P C ( s m ) .
  • Generating tables describing the blocks of CoreFD and deriving systems (36)–(38).
  • Creating a table of LUTerFA and SBFs (39) and (40).
  • Creating a table of LUTerASV and system (41).
  • Implementing a P F 2 T Y Mealy FSM circuit using internal resources of a particular FPGA chip.
We hope that all the presented steps of this method are clear from the previous text. We do not, however, consider this method in detail. This will be the subject of a separate study. Now we will show that the generalized architecture (Figure 13) generates 6 more architectures. Three conditions are used for this purpose. The fulfillment of condition (18) indicates the presence of the block CoreFC in the FSM circuit architecture. This means that the set S 0 contains at least one element. The fulfillment of condition (35) indicates the presence of the block CoreFD. In this case, the set S3 contains at least one element. Finally, the fulfillment of the condition
S L > L ( s m ) > S L R 1
indicates the presence of the block CorePC. In this case, the set S2 contains at least one element. We show the possible FSM models in Table 11. Additionally, the table rows contain conditions (or their conjunctions) in which a particular architecture should be used.
The first three columns of the table contain the names of the sets ( S 0 , S 2 , S 3 ) and corresponding architectural blocks (CoreFC, CorePC, CoreFD). The fourth column contains the model designation. The fifth column shows which combination of conditions leads to the model from a particular row. If there is a zero (one) at the intersection of the column with the block and the row with the model, then this block is not included (is included) in the FSM architecture corresponding to this row.
For example, if all states satisfy condition (35), then the architecture includes only CoreFD. We denote this architecture by the symbol P F Y . This is the first row of Table 11. If some of states satisfy condition (35) and others satisfy condition (42), then the architecture includes blocks CorePC and CoreFD (row 3). This leads to P F T Y FSMs, and so on. The last row corresponds to the generalized FSM architecture, including three cores of partial functions.
Using Table 11 and generalized architecture, we can obtain the architecture for any model represented by this table. Obviously, it is possible to transform the design method for P F 2 T Y FSMs into a design method for any other model. In this case, of particular interest is the implementation of Step 2 of the proposed method and the definition of a model corresponding to its outcome. We have presented the algorithm for performing these steps in Figure 14.
Let us consider this algorithm. Block 1 shows the initial information (FSM is represented by STG and the FPGA chip is represented by the number of inputs of the LUT element). Next, this STG must be converted to the equivalent STT (block 2).
The distribution of states over sets S 0 , S 2 , S 3 is performed in a cycle, including blocks 3–7. The distribution starts from the first state (block 3). In block 4, condition (18) is checked. If this condition is met (output “Yes” from block 4), then the state s m S is placed in set S 0 (block FC). If this condition is violated (output “No” from block 4), then condition (42) is checked (block 5). If this condition is met (output “Yes” from block 5), then the state s m S is placed in set S3 (block FD). If this condition is violated (output “No” from block 5), then the state s m S is placed in set S2 (block PC). The analysis of the next state begins (block 6). If all states are distributed (output “Yes” from block 7), then the FSM architecture selection begins (transition to block 8). Otherwise (output “No” from block 7), the analysis continues (transition to block 4).
To choose an architecture, we analyze whether empty sets are obtained in the process of distributing states. The analysis begins with checking the set S 0 (block 8). As follows from Table 11, if the set S 0 is empty (output “Yes” from block 8), then the choice is made among three architectures ( P F Y , P T Y , P F T Y ). Set S2 is analyzed (block 9). If it is empty (output “Yes” from block 9), then the FSM P F Y is selected (block 11). If set S2 is not empty (output “No” from block 9), then set S3 is analyzed (block 12). If it is empty (output “Yes” from block 12), then the FSM P T Y is selected (block 15). If set S3 is not empty (output “No” from block 12), then the FSM P F T Y is selected (block 16).
If the set S 0 is not empty (output “No” from block 8), then the choice is made among four architectures ( P Y , P F F Y , P 2 T Y , P F 2 T Y ). Set S2 is analyzed (block 10). If it is empty (output “Yes” from block 10), then set S3 is analyzed (block 13). If S3 is empty (output “Yes” from block 13), then the FSM PY is selected (block 17). If S3 is not empty (output “No” from block 13), then the FSM P F F Y is selected (block 18). If set S2 is not empty (output “No” from block 10), then set S3 is analyzed (block 14). If S3 is empty (output “Yes” from block 14), then the FSM P 2 T Y is selected (block 19). If S3 is not empty (output “No” from block 14), then the FSM P F 2 T Y is selected (block 20).
Thus, the architecture has been chosen, and it is necessary to proceed to the synthesis of the corresponding FSM model. We hope that the relationship between Table 11 and the algorithm (Figure 14) is transparent enough.

8. Conclusions

Modern FPGAs are widely used in the design of cyber-physical systems [13]. These chips are very powerful: a single FPGA chip is enough for implementing practically any block (either combinational or sequential) of modern CPSs [50]. The reverse side of the FPGA universality is an extremely small number of LUT inputs [21,51]. This is a serious drawback that significantly complicates the design process. As a result, various methods of functional decomposition should be applied in the step of technology mapping. It is known that FD-based circuits are multi-level. The disadvantages of layered circuits are well known: they are slower and less energy efficient than equivalent single-level counterparts.
Better results can be obtained by replacing the functional decomposition with the structural one [10]. This is proved, for example, in the work [19]. In the paper [20], the FSM circuit optimization is achieved due to using the twofold state assignment and encoding of output collections. The resulting P T Y FSM circuits have better values of LUT counts than their FD-based counterparts. However, the twofold state assignment is connected with the transformation of maximum binary state codes into their extended equivalents. As a result, a code transformer should be used that consumes some additional resources of the FPGA fabric.
To reduce the LUT count in the circuits of P T Y -based FSMs, we propose to use two LUT-based blocks (cores). To do this, we use the main ideas from the paper [22]. Both cores generate systems of partial Boolean functions. This leads to P 2 T Y FSMs having the following peculiarity: one of the cores uses the MBCs, whereas the second core uses the partial state codes. Our approach reduces LUT counts and slightly improves temporal characteristics as compared to equivalent P T Y -based FSMs. The overhead of the proposed method is a rather insignificant increase in consumption of power (up to 1.56% on average). We hope the proposed P 2 T Y FSMs can function as an efficient tool for implementing FPGA-based sequential devices in modern cyber-physical systems.
The conducted experiments have shown that, under certain conditions, the proposed method allows for better results than methods based entirely on either maximum binary or one-hot state codes. If some partial functions are implemented using a single LUT, then our method allows for improving the spatial, temporal, and energy characteristics of the LUT-based circuits of sequential blocks. We think that our method can be modified to take into account the use of state assignment methods other than the twofold one. In this, we see the further directions for the proposed method development.
Under certain conditions, the transition from the proposed model to other models is possible (Table 11). In the most general case, the FSM architecture consists of three cores of partial functions. There are also three dual-core architectures. One of the directions of our further research is the development of synthesis methods and the study of the characteristics of LUT-based FSM circuits based on these two- and three-core models.

Author Contributions

Conceptualization, A.B., L.T. and K.K.; methodology, A.B., L.T., K.K. and S.S.; software, A.B., L.T. and K.K.; validation, A.B., L.T. and K.K.; formal analysis, A.B., L.T., K.K. and S.S.; investigation, A.B., L.T. and K.K.; writing—original draft preparation, A.B., L.T., K.K. and S.S.; supervision, A.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in the article.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CLBconfigurable logic block
CPScyber-physical system
ESCextended state code
FDfunctional decomposition
FPGAfield-programmable gate array
FSMfinite state machine
IMFinput memory function
LUTlook-up table
MBCmaximum binary code
OCoutput collection
PBFpartial Boolean function
SBFsystem of Boolean functions
SDstructural decomposition
SOPsum-of-products
STGstate transitions graph
STTstate transition table
TSAtwofold state assignment

References

  1. Alur, R. Principles of Cyber-Physical Systems; MIT Press: Cambridge, MA, USA, 2015. [Google Scholar]
  2. Suh, S.C.; Tanik, U.J.; Carbone, J.N.; Eroglu, A. Applied Cyber-Physical Systems; Springer: New York, NY, USA, 2014. [Google Scholar]
  3. Marwedel, P. Embedded System Design: Embedded Systems Foundations of Cyber-Physical Systems, and the Internet of Things, 3rd ed.; Springer International Publishing: New York, NY, USA, 2018. [Google Scholar]
  4. Kovtun, V.; Izonin, I.; Gregus, M. Reliability model of the security subsystem countering to the impact of typed cyber-physical attacks. Sci. Rep. 2022, 121, 12849. [Google Scholar] [CrossRef]
  5. Wojnakowski, M.; Wisniewski, R.; Bazydlo, G.; Poplawski, M. Analysis of safeness in a Petri net-based specification of the control part of cyber-physical systems. Int. J. Appl. Math. Comput. Sci. 2021, 31, 647–657. [Google Scholar]
  6. Wisniewski, R.; Bazydlo, G.; Gomes, L.; Costa, A.; Wojnakowski, M. Analysis and design automation of cyber-physical system with hippo and IOPT-tools. In Proceedings of the IECON 2019—45th Annual Conference of the IEEE Industrial Electronics Society, Lisbon, Portugal, 14–17 October 2019; Volume 1, pp. 5843–5848. [Google Scholar]
  7. Bazydlo, G.; Costa, A.; Gomes, L. Integrating different modelling formalisms supporting co-design development of controllers for cyber-physical systems—A case study. In Proceedings of the 2022 IEEE 9th International Conference on e-Learning in Industrial Electronics (ICELIE), Brussels, Belgium, 17–20 October 2022; pp. 1–6. [Google Scholar]
  8. Wisniewski, R.; Wojnakowski, M.; Li, Z. Design and Verification of Petri-Net-Based Cyber-Physical Systems Oriented toward Implementation in Field-Programmable Gate Arrays—A Case Study Example. Energies 2023, 16, 67. [Google Scholar] [CrossRef]
  9. Wisniewski, R.; Benysek, G.; Gomes, L.; Kania, D.; Simos, T.; Zhou, M. IEEE Access Special Section: Cyber-Physical Systems. IEEE Access 2019, 7, 157688–157692. [Google Scholar] [CrossRef]
  10. Barkalov, A.; Titarenko, L.; Mazurkiewicz, M. Foundations of Embedded Systems; Springer International Publishing: New York, NY, USA, 2019. [Google Scholar]
  11. Gajski, D.D.; Abdi, S.; Gerstlauer, A.; Schirner, G. Embedded System Design: Modeling, Synthesis and Verification; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
  12. Gazi, O.; Arli, A. State Machines Using VHDL: FPGA Implementation of Serial Communication and Display Protocols; Springer: Berlin, Germany, 2021; p. 326. [Google Scholar]
  13. Bhattacharjya, A.; Wisniewski, R.; Nidumolu, V. Holistic Research on Blockchain’s Consensus Protocol Mechanisms with Security and Concurrency Analysis Aspects of CPS. Electronics 2022, 11, 2760. [Google Scholar] [CrossRef]
  14. Baranov, S. Logic Synthesis of Control Automata; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1994. [Google Scholar]
  15. Czerwinski, R.; Kania, D. Finite State Machine Logic Synthesis for Complex Programmable Logic Devices; Lecture Notes in Electrical Engineering; Springer: Berlin/Heidelberg, Germany, 2013; Volume 231. [Google Scholar]
  16. Baranov, S. Finite State Machines and Algorithmic State Machines: Fast and Simple Design of Complex Finite State Machines; Amazon: Seattle, WA, USA, 2018; p. 185. [Google Scholar]
  17. Micheli, G.D. Synthesis and Optimization of Digital Circuits; McGraw-Hill: Cambridge, MA, USA, 1994. [Google Scholar]
  18. Islam, M.M.; Hossain, M.S.; Shahjalal, M.D.; Hasan, M.K.; Jang, Y.M. Area-time efficient hardware implementation of modular multiplication for elliptic curve cryptography. IEEE Access 2020, 8, 73898–73906. [Google Scholar] [CrossRef]
  19. Barkalov, A.; Titarenko, L.; Krzywicki, K. Structural Decomposition in FSM Design: Roots, Evolution, Current State—A Review. Electronics 2021, 10, 1174. [Google Scholar] [CrossRef]
  20. Barkalov, O.; Titarenko, L.; Mielcarek, K. Hardware reduction for LUT-based Mealy FSMs. Int. J. Appl. Math. Comput. Sci. 2018, 28, 595–607. [Google Scholar] [CrossRef]
  21. AMD Xilinx FPGAs. Available online: https://www.xilinx.com/products/silicon-devices/fpga.html (accessed on 1 March 2023).
  22. Barkalov, A.; Titarenko, L.; Krzywicki, K. Using a Double-Core Structure to Reduce the LUT Count in FPGA-Based Mealy FSMs. Electronics 2022, 11, 3089. [Google Scholar] [CrossRef]
  23. Baranov, S. High-Level Synthesis of Digital Systems: For Data-Path and Control Dominated Systems; Amazon: Seattle, WA, USA, 2018; p. 207. [Google Scholar]
  24. Kubica, M.; Opara, A.; Kania, D. Logic Synthesis Strategy Oriented to Low Power Optimization. Appl. Sci. 2021, 11, 8797. [Google Scholar] [CrossRef]
  25. Zhao, X.; He, Y.; Chen, X.; Liu, Z. Human-Robot collaborative Assembly Based on Eye-Hand and a Finite State Machine in a Virtual Environment. Appl. Sci. 2021, 11, 5754. [Google Scholar] [CrossRef]
  26. Koo, B.; Bae, J.; Kim, S.; Park, K.; Kim, H. Test case generation method for increasing software reliability in Safety-Critical Embedded Systems. Electronics 2020, 9, 797. [Google Scholar] [CrossRef]
  27. Senhadji-Navarro, R.; Garcia-Vargas, I. Methodology for Distributed-ROM-based Implementation of Finite State Machines. IEEE Trans.-Comput. Des. Integr. Circuits Syst. 2020, 40, 2411–2415. [Google Scholar] [CrossRef]
  28. Skliarova, I. A Survey of Network-Based Hardware Accelerators. Electronics 2022, 11, 1029. [Google Scholar] [CrossRef]
  29. Mishchenko, A.; Brayton, R.; Jiang, J.H.; Jang, S. RESP: Ok. Scalable don’t-care-based logic optimization and resynthesis. ACM Trans. Reconfigurable Technol. Syst. 2011, 4, 1–23. [Google Scholar] [CrossRef]
  30. El-Maleh, A.H. A Probabilistic Tabu Search State Assignment Algorithm for Area and Power Optimization of Sequential Circuits. Arab. J. Sci. Eng. 2020, 45, 6273–6285. [Google Scholar] [CrossRef]
  31. Feng, W.; Greene, J.; Mishchenko, A. Improving FPGA performance with a S44 LUT structure. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 25–27 February 2018; pp. 61–66. [Google Scholar]
  32. Chapman, K. Multiplexer Design Techniques for Datapath Performance with Minimized Routing Resources. Application Note. 2012. Available online: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.259.5300&rep=rep1&type=pdf (accessed on 1 March 2023).
  33. Senhadji-Navarro, R.; Garcia-Vargas, I. Mapping Arbitrary Logic Functions onto Carry Chains in FPGAs. Electronics 2022, 11, 27. [Google Scholar] [CrossRef]
  34. Kubica, M.; Opara, A.; Kania, D. Technology Maping for LUT-Based FPGA; Springer: Berlin, Germany, 2021; p. 208. [Google Scholar]
  35. Solov’ev, V.V. Implementation of finite-state machines based on programmable logic ICs with the help of the merged model of Mealy and Moore machines. J. Commun. Technol. Electron. 2013, 58, 172–177. [Google Scholar] [CrossRef]
  36. Park, J.; Yoo, H. Area-efficient fault tolerance encoding for Finite State Machines. Electronics 2020, 9, 1110. [Google Scholar] [CrossRef]
  37. Baranov, S. From Algorithm to Digital System: HSL and RTL Tool Sinthagate in Digital System Design; Amazon: Seattle, WA, USA, 2020; p. 76. [Google Scholar]
  38. Barkalov, A.; Titarenko, L.; Krzywicki, K.; Saburova, S. Improving Characteristics of LUT-Based Three-Block Mealy FSMs’ Circuits. Electronics 2022, 11, 950. [Google Scholar] [CrossRef]
  39. Khatri, S.P.; Gulati, K. Advanced Techniques in Logic Synthesis, Optimizations and Applications; Springer: New York, NY, USA, 2011. [Google Scholar]
  40. Sentowich, E.; Singh, K.; Lavango, L.; Moon, C.; Murgai, R.; Saldanha, A.; Savoj, H.; Stephan, P.R.; Bryton, R.K.; Sangiovanni-Vincentelli, A.L. SIS: A System for Sequential Circuit Synthesis; Technical Report; University of California, Berkely: Berkely, CA, USA, 1992. [Google Scholar]
  41. Tatalov, E. Synthesis of Compositional Microprogram Control Units for Programmable Devices. Master’s Thesis, Donetsk National Technical University, Donetsk, Ukraine, 2011. [Google Scholar]
  42. McElvain, K. LGSynth93 Benchmark; Mentor Graphics: Wilsonville, OR, USA, 1993. [Google Scholar]
  43. Scholl, C. Functional Decomposition with Application to FPGA Synthesis; Kluwer Academic Publishers: Boston, MA, USA, 2001. [Google Scholar]
  44. Barkalov, A.; Titarenko, L.; Krzywicki, K. Reducing LUT Count for FPGA-Based Mealy FSMs. Appl. Sci. 2020, 10, 5115. [Google Scholar] [CrossRef]
  45. Kubica, M.; Kania, D. Technology mapping oriented to adaptive logic modules. Bull. Pol. Acad. Sci. 2019, 67, 947–956. [Google Scholar]
  46. Mishchenko, A.; Chattarejee, S.; Brayton, R. Improvements to technology mapping for LUT-based FPGAs. IEEE Trans. CAD 2006, 27, 240–253. [Google Scholar]
  47. Vivado Design Suite User Guide: Synthesis. UG901 (v2019.1). Available online: https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_1/ug901-vivado-synthesis.pdf (accessed on 1 March 2023).
  48. Tiwari, A.; Tomko, K.A. Saving power by mapping finite-state machines into embedded memory blocks in FPGAs. Proc. Des. Autom. Test Eur. Conf. Exhib. 2004, 2, 916–921. [Google Scholar]
  49. Lucía, Ó.; Monmasson, E.; Navarro, D.; Barragán, L.A.; Urriza, I.; Artigas, J.I. Modern control architectures and implementation. Control Power Electron. Convert. Syst. 2018, 2, 477–502. [Google Scholar]
  50. Ruiz-Rosero, J.; Ramirez-Gonzalez, G.; Khanna, R. Field Programmable Gate Array Applications—A Scientometric Review. Computation 2019, 7, 63. [Google Scholar] [CrossRef]
  51. Altera. Cyclone IV Device Handbook. Available online: http://www.altera.com/literature/hb/cyclone-iv/cyclone4-handbook.pdf (accessed on 1 March 2023).
Figure 1. Architecture of P Mealy FSM.
Figure 1. Architecture of P Mealy FSM.
Applsci 13 10200 g001
Figure 2. Architecture of PY Mealy FSM.
Figure 2. Architecture of PY Mealy FSM.
Applsci 13 10200 g002
Figure 3. Architecture of P T Y Mealy FSM.
Figure 3. Architecture of P T Y Mealy FSM.
Applsci 13 10200 g003
Figure 4. Generalized diagram of a P T Y Mealy FSM.
Figure 4. Generalized diagram of a P T Y Mealy FSM.
Applsci 13 10200 g004
Figure 5. Architecture of the P 2 T Y Mealy FSM.
Figure 5. Architecture of the P 2 T Y Mealy FSM.
Applsci 13 10200 g005
Figure 6. Initial STG.
Figure 6. Initial STG.
Applsci 13 10200 g006
Figure 7. Maximum binary state codes for FSM A1.
Figure 7. Maximum binary state codes for FSM A1.
Applsci 13 10200 g007
Figure 8. Codes of output collections for Mealy FSM A1.
Figure 8. Codes of output collections for Mealy FSM A1.
Applsci 13 10200 g008
Figure 9. Partial state codes for Mealy FSM A1.
Figure 9. Partial state codes for Mealy FSM A1.
Applsci 13 10200 g009
Figure 10. Karnaugh map for SBF ASV2(SV).
Figure 10. Karnaugh map for SBF ASV2(SV).
Applsci 13 10200 g010
Figure 11. Interaction of FSM with other system blocks.
Figure 11. Interaction of FSM with other system blocks.
Applsci 13 10200 g011
Figure 12. Percent summary of results.
Figure 12. Percent summary of results.
Applsci 13 10200 g012
Figure 13. Generalized architecture of P F 2 T Y Mealy FSM.
Figure 13. Generalized architecture of P F 2 T Y Mealy FSM.
Applsci 13 10200 g013
Figure 14. Selection of FSM model.
Figure 14. Selection of FSM model.
Applsci 13 10200 g014
Table 1. STT of FSM A1.
Table 1. STT of FSM A1.
S C S T I CT O CT qh
s 1 s 2 i 1 o 3 41
s 3 i 1 ¯ o 2 o 6 52
s 2 s 5 i 2 o 1 o 7 23
s 6 i 2 ¯ i 5 o 4 34
s 3 i 2 ¯ i 5 ¯ o 1 o 4 65
s 3 s 6 i 1 o 2 86
s 1 i 1 ¯ o 4 o 5 o 6 107
s 4 s 1 i 1 o 2 o 6 58
s 4 i 1 ¯ 19
s 5 s 1 i 5 o 1 o 4 610
s 2 i 5 ¯ i 6 o 5 711
s 7 i 5 ¯ i 6 ¯ o 1 o 5 o 7 912
s 6 s 4 i 3 o 4 313
s 5 i 3 ¯ o 1 o 7 214
s 7 s 4 i 2 o 5 715
s 8 i 2 ¯ o 1 o 4 616
s 8 s 7 i 3 o 1 o 5 o 7 917
s 4 i 3 ¯ i 7 118
s 9 i 3 ¯ i 7 ¯ o 5 719
s 9 s 4 i 5 o 1 o 7 220
s 1 i 5 ¯ i 7 121
s 8 i 5 ¯ i 7 ¯ o 5 722
Table 2. Table of CoreFC for Mealy FSM A1.
Table 2. Table of CoreFC for Mealy FSM A1.
S C FC ( S c ) S T FC ( S T ) I 1 h AV h 0 AV 1 h D h 0 h
S 1 0000 S 2 0100 i 1 a 3 a 4 a 1 D 2 1
S 3 0001 i 1 ¯ a 1 D 4 2
S 3 0001 S 6 1000 i 1 a 4 a 1 D 1 3
S 1 0000 i 1 ¯ a 3 a 1 4
S 4 0010 S 1 0000 i 1 a 1 5
S 4 0010 i 1 ¯ D 3 6
Table 3. Table of CorePC( S 1 ).
Table 3. Table of CorePC( S 1 ).
S C PC ( S c ) S T FC ( S T ) I 2 h AV h 1 D h 1 h
S 2 01 S 5 0101 i 2 a 2 D 2 D 4 1
S 6 1000 i 2 ¯ i 5 a 4 D 1 2
S 3 0001 i 2 ¯ i 5 ¯ a 2 a 4 D 3 3
S 5 10 S 1 0000 i 5 a 2 a 4 4
S 2 0100 i 5 ¯ i 6 ¯ a 3 D 2 5
S 7 0110 i 5 ¯ i 6 ¯ a 2 a 3 D 2 D 3 6
S 7 11 S 4 0010 i 2 a 3 D 3 7
S 8 1001 i 2 ¯ a 2 a 4 D 1 D 4 8
Table 4. Table of CorePC( S 2 ).
Table 4. Table of CorePC( S 2 ).
S C PC ( S c ) S T FC ( S T ) I 2 h AV h 1 D h 1 h
S 6 01 S 4 0010 i 3 a 4 D 3 1
S 5 0101 i 3 ¯ a 2 D 2 D 4 2
S 8 10 S 7 0110 i 3 a 2 a 3 3
S 4 0010 i 3 ¯ i 7 D 2 4
S 9 1010 i 3 ¯ i 7 ¯ a 3 D 2 D 3 5
S 9 11 S 4 0010 i 5 a 2 D 3 6
S 1 0000 i 5 ¯ i 7 7
S 8 1001 i 5 ¯ i 7 ¯ a 3 D 1 D 4 8
Table 5. Table of LUTerFA.
Table 5. Table of LUTerFA.
Function CoreFC CorePC
S 1 S 2
D 1 111
D 2 111
D 3 111
D 4 111
a 1 100
a 2 011
a 3 111
a 4 111
Table 6. Table of LUTerASV.
Table 6. Table of LUTerASV.
S m FC ( S m ) PC ( S m ) AV m
s 2 01000100 v 2
s 5 01011000 v 1
s 6 10000001 v 4
s 7 01101100 v 1 v 2
s 8 10010010 v 3
s 9 10100011 v 3 v 4
Table 7. Experimental results (LUT counts).
Table 7. Experimental results (LUT counts).
BenchmarkAutoOne-HotJEDI PTY Our ApproachComplexity
bbara2121141312C1
bbsse4044312622C1
bbtas7771010C0
beecount2222171311C1
cse4773433531C1
dk141930131311C1
dk151819151110C1
dk161736141212C1
dk1771471010C0
dk2746599C0
dk5121111101212C0
donfile3333262117C1
ex17983624744C2
ex2111110109C1
ex31111111212C0
ex42119181412C1
ex51111111212C0
ex62941272420C1
ex767666C1
keyb5068474138C1
kirkman5470514035C2
lion47477C0
lion9813799C0
mark12828252118C1
mc710799C0
modulo128881010C0
opus3333272420C1
planet138138958076C2
planet1138138958076C2
pma102102948173C2
s173107696054C2
s14881321391169389C2
s149413414011810192C2
s1a5789514339C2
s2082342211916C2
s2710221099C1
s3863346292319C1
s4202950282420C4
s5106767514438C4
s8201313131110C1
s832106100867267C4
s8409897807062C4
sand143143125107101C3
shiftreg37388C0
sse4044373127C1
styr102129907873C2
tma5246463732C2
Sum20992395178015421409
Percentage, %148.97169.98126.33109.44100.00
Table 8. Experimental results (cycle time in nanoseconds).
Table 8. Experimental results (cycle time in nanoseconds).
BenchmarkAutoOne-HotJEDI PTY Our ApproachComplexity
bbara8.8118.8118.3529.3949.601C1
bbsse10.0969.6429.2139.7639.924C1
bbtas8.4978.4978.4519.4979.497C0
beecount9.6059.6058.9419.5689.740C1
cse10.5589.8409.3439.5709.764C1
dk148.8219.3958.7629.9649.070C1
dk158.7978.9988.7359.8909.009C1
dk169.4919.3208.6729.3279.539C1
dk178.6179.5878.6179.6179.617C0
dk278.3258.4248.3699.3259.325C0
dk5128.5668.5668.4779.5669.566C0
donfile9.0339.0348.5097.9167.628C1
ex110.42510.9559.4548.4968.496C2
ex28.6358.6358.5969.5669.738C1
ex38.7318.7318.7079.7319.731C0
ex49.2149.3158.8749.7459.902C1
ex59.1479.1479.11910.14710.147C0
ex69.5649.7729.3309.7019.863C1
ex78.5988.5788.5849.5829.751C1
keyb10.12110.6999.66610.06310.174C1
kirkman10.97110.39210.28010.62110.300C2
lion8.5398.5018.5419.5959.595C0
lion98.4708.9988.4449.4279.427C0
mark19.8259.8259.3439.94210.063C1
mc8.6888.7198.6829.6889.688C0
modulo128.3028.3028.2999.3029.302C0
opus9.6849.6849.27510.29010.353C1
planet11.26411.2649.0739.8979.791C2
planet111.26411.2649.0739.8979.791C2
pma10.63410.6349.68110.0159.963C2
s110.62311.15410.15610.66910.308C2
s148811.01311.37210.15510.31410.299C2
s149410.48710.6549.87810.63010.163C2
s1a10.3139.4629.70410.38510.185C2
s2089.5039.4349.3619.8599.684C2
s278.6728.8628.6629.6719.832C1
s3869.6769.4949.3119.90510.198C1
s4209.8649.7809.7559.7199.632C4
s5109.7429.7429.1559.6899.115C4
s82010.69110.6419.77510.31710.416C1
s83210.97510.6389.8669.6979.233C4
s8409.1959.2289.1589.1089.032C4
sand12.39012.39011.65210.99510.895C3
shiftreg8.3027.2657.0918.8028.802C0
sse10.0969.6429.45510.16510.260C1
styr11.06711.49710.66611.54011.646C2
tma9.83110.4959.82110.24710.197C2
Sum453.73454.88431.08460.81458.25
Percentage, %99.0199.2794.07100.56100.00
Table 9. Experimental results (maximum operating frequency in MHz).
Table 9. Experimental results (maximum operating frequency in MHz).
BenchmarkAutoOne-HotJEDI PTY Our ApproachComplexity
bbara113.496113.496119.727106.456104.152C1
bbsse99.049103.713108.539102.428100.766C1
bbtas117.687117.687118.336105.295105.295C0
beecount104.112104.112111.839104.520102.669C1
cse94.713101.626107.030104.488102.422C1
dk14113.364106.439114.134100.361110.248C1
dk15113.675111.137114.487101.111111.002C1
dk16105.362107.294115.316107.219104.835C1
dk17116.049104.308116.049103.982103.982C0
dk27120.122118.709119.494107.240107.240C0
dk512116.740116.740117.963104.537104.537C0
donfile110.706110.696117.517126.323131.093C1
ex195.92291.281105.777117.700117.700C2
ex2115.808115.808116.340104.540102.692C1
ex3114.536114.536114.846102.766102.766C0
ex4108.530107.352112.690102.621100.991C1
ex5109.327109.327109.66198.55398.553C0
ex6104.556102.333107.183103.082101.394C1
ex7116.306116.576116.495104.364102.550C1
keyb98.80693.466103.45399.37598.291C1
kirkman91.14896.23297.27294.15297.084C2
lion117.110117.634117.083104.226104.226C0
lion9118.065111.136118.421106.080106.080C0
mark1101.781101.781107.032100.58599.372C1
mc115.102114.694115.174103.221103.221C0
modulo12120.454120.454120.498107.505107.505C0
opus103.265103.265107.81897.18196.590C1
planet88.77788.777110.222101.038102.132C2
planet188.77788.777110.222101.038102.132C2
pma94.03994.039103.29399.855100.375C2
s194.13489.65398.46593.73197.009C2
s148890.80087.93498.47296.96097.101C2
s149495.35793.861101.23694.07498.396C2
s1a96.963105.687103.04896.29798.188C2
s208105.231106.000106.825101.426103.260C2
s27115.314112.842115.449103.400101.705C1
s386103.348105.329107.401100.96498.059C1
s420101.378102.249102.514102.891103.822C4
s510102.648102.648109.226103.205109.704C4
s82093.53793.975102.30096.93296.006C1
s83291.11794.001101.354103.126108.309C4
s840108.755108.364109.196109.795110.717C4
sand80.71180.71185.82190.94991.784C3
shiftreg120.454137.645141.028113.612113.612C0
sse99.049103.713105.76098.37597.468C1
styr90.35986.97993.75486.65785.867C2
tma101.71995.284101.81997.58898.065C2
Sum4918.264910.305157.584811.824840.96
Percentage, %101.60101.43106.5499.40100.00
Table 10. Experimental results (consumed power in watts).
Table 10. Experimental results (consumed power in watts).
BenchmarkAutoOne-HotJEDI PTY Our ApproachComplexity
bbara0.9610.9610.8800.8980.911C1
bbsse2.6511.6372.1442.2282.243C1
bbtas0.9000.9000.9000.9230.923C0
beecount2.0112.0111.4011.4891.497C1
cse1.3891.4501.3221.3461.362C1
dk143.3393.7103.3323.3413.368C1
dk151.7832.2851.7791.7721.788C1
dk163.3343.1092.8792.8812.895C1
dk172.2682.3022.2582.2862.286C0
dk271.5241.2101.5141.5391.539C0
dk5121.8521.8521.7011.7431.743C0
donfile1.0761.0760.9700.9921.034C1
ex14.5643.4302.8042.6122.688C2
ex20.7350.7530.7090.6980.712C1
ex30.7580.7580.7580.7980.798C0
ex41.9801.6591.6051.6251.641C1
ex50.7540.7540.7520.7750.775C0
ex62.6754.2562.6482.6732.691C1
ex71.3591.5481.3611.3821.412C1
keyb1.5241.5021.5061.5281.541C1
kirkman2.2042.3551.9501.8541.892C2
lion0.9090.9960.9140.9530.953C0
lion91.1001.3371.0951.1121.112C0
mark11.8511.8511.6331.6611.683C1
mc0.8270.9410.8230.8630.863C0
modulo120.9150.9150.9190.9410.941C0
opus1.7501.7501.6891.7081.734C1
planet4.5534.5532.8872.9143.121C2
planet14.5534.5532.8872.9143.121C2
pma1.8181.8181.7011.7261.747C2
s13.1333.5782.9663.0893.118C2
s14884.4304.5443.9964.0014.108C2
s14943.5273.6263.4303.5233.596C2
s1a1.7702.4581.6561.6721.689C2
s2081.8583.3111.7401.7691.784C2
s271.1482.3421.1571.1641.183C1
s3861.6821.8241.5521.5711.593C1
s4201.9603.4431.9091.8121.861C4
s5102.1662.1661.7141.6431.685C4
s8201.1281.1971.1241.1421.151C1
s8322.6622.4092.0711.9151.932C4
s8402.7042.6952.4362.2642.283C4
sand1.6401.6401.4791.4211.443C3
shiftreg0.8790.9590.8680.8990.899C0
sse1.6511.7271.5201.5431.561C1
styr4.5065.2333.6493.7213.751C2
tma2.0201.7451.7521.7811.803C2
Sum96.78103.1384.7485.1186.45
Percentage, %111.95119.2998.0298.44100.00
Table 11. Possible FSM models.
Table 11. Possible FSM models.
S 0 S 2 S 3 ModelConditions
FCPCFD
001 P F Y (35)
010 P T Y (42)
011 P F T Y (35) and (42)
100 P Y (18)
101 P F F Y (18) and (35)
110 P 2 T Y (18) and (42)
111 P F 2 T Y (18), (35), and (42)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Barkalov, A.; Titarenko, L.; Krzywicki, K.; Saburova, S. Improving Characteristics of FPGA-Based FSMs Representing Sequential Blocks of Cyber-Physical Systems. Appl. Sci. 2023, 13, 10200. https://doi.org/10.3390/app131810200

AMA Style

Barkalov A, Titarenko L, Krzywicki K, Saburova S. Improving Characteristics of FPGA-Based FSMs Representing Sequential Blocks of Cyber-Physical Systems. Applied Sciences. 2023; 13(18):10200. https://doi.org/10.3390/app131810200

Chicago/Turabian Style

Barkalov, Alexander, Larysa Titarenko, Kazimierz Krzywicki, and Svetlana Saburova. 2023. "Improving Characteristics of FPGA-Based FSMs Representing Sequential Blocks of Cyber-Physical Systems" Applied Sciences 13, no. 18: 10200. https://doi.org/10.3390/app131810200

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop