State Machines Synthesis and Implementation into FPGAs with Multiple Encoding of States
Arkadiusz Bukowiec, Alexander Barkalov, and Larysa Titarenko
Abstract — The method of synthesis and implementation into FPGAs (Field Programmable Gate Arrays) of Mealy FSMs (Finite State Machines) is proposed. Synthesis is based on the architectural decomposition and the multiple encoding. A set of states is divided into subsets based on a current state or a executed microinstruction. Then, states are encoded separately in each subset. The state is decoded in the second-level circuit based on the multiple code and the code of a current state or the code of a executed microinstruction. It leads to implementation of an FSM in double-level structure where utilization of both, LUTs (Look-Up Tables) and embedded memory blocks, is applied. It leads to balanced usage of hardware resources of an FPGA device.
Index Terms — Circuit synthesis, Field programmable gate arrays, Finite state machines, Logic design
I. Introduction
FINITE state machines (FSMs) with Mealy’s outputs [1] are one of the most popular model used in designing control units (CUs) of digital systems. Nowadays field programmable gate arrays (FPGAs) are used very often for implementation of such digital systems [6]. One of the main features of FPGAs is existence of logic elements with restricted number of inputs that are named look-up tables (LUTs) [4]. From another side, logic functions of FSMs (called p-functions) have much more arguments (up to 200) than typical LUTs have inputs (up to 6). This imbalance leads to need of a functional decomposition of Boolean functions describing the behavior of an FSM [5]. The negative result of functional decomposition is increasing a number of levels of the logic circuit of an FSM and increasing a number of required LUTs for a implementation.
On the other hand, new FPGAs are equipped in embedded memory blocks [9]. These blocks can be also used for realization of combinational circuits. The problem is that
Manuscript received December 22, 2008.
A. Bukowiec is with the Institute of Computer Engineering and Electronics, University of Zielona Gora, Podgorna 50, 65-246 Zielona Gora, Poland (phone: +48 68 328 2304; fax: +48 68 324 4733; e-mail: a.bukowiec@iie.uz.zgora.pl).
A. Barkalov is with the Institute of Computer Engineering and Electronics, University of Zielona Gora, Podgorna 50, 65-246 Zielona Gora, Poland (e-mail: a.barkalov@ iie.uz.zgora.pl).
L. Titarenko is with the Institute of Computer Engineering and Electronics, University of Zielona Gora, Podgorna 50, 65-246 Zielona Gora, Poland (e-mail: l.titarenko@ iie.uz.zgora.pl).
implementation only with memory blocks also utilize a big number of such blocks and very often exceed the number of available blocks in an FPGA device.
One of methods of decreasing a number of p-functions depending on a big number of arguments is an architectural decomposition of an FSM [2]. Such methods apply encoding of some parameters of an FSM. It leads to implementation of an FSM in a double-level structure where a reduced number of p-functions is realized in the circuit of first level, this circuit is implemented with LUTs, and the circuit of second level operates as a decoder and it is implemented with memory blocks.
The proposed in this article method of synthesis is based on the encoding of internal states divided into subsets based on a current state or a currently executed microinstruction [3]. This encoding allows to decrease a number of p-functions implemented by the combinational circuit of an FSM. The state is decoded in the second level circuit based on the multiple code and the code of a current state or the code of a currently executed microinstruction. Because this system is regular it can be implemented with embedded memory blocks. It leads to decrease a number of LUT elements required for implementation of a logic circuit of an FSM and balanced usage of different resources of an FPGA device.
II. Finite State Machine Definition
A finite state machine is a mathematical model of behavior composed of a finite set of input symbols, a finite nonempty set of states, a finite set of output symbols, transitions and actions [1], [2]. This model can be represented as six tuple:
S = (X,Y, A,ax,S,a), (1)
where:
• X is a finite set of input Boolean variables, X = bi,---, xL };
• Y is a finite set of output Boolean variables, called microoperations (qO), Y = {{•••, yN};
• A is a finite, nonempty set of states,
A = {ai,K, aM };
• aj is the initial state of the FSM, al є A ;
• S is a transition function, defined as a function of a state and affirmation or negation of some input variables:
S : A x X ^ A ; (2)
• a is a output function, and in case of Mealy model it is defined as a function of a state and affirmation or negation of some input variables:
a: A x X ^ Y . (3)
In case of Moore model it is defined only as a function of a state:
a : A ^ Y . (4)
Such defined a Mealy FSM can be set up by a direct structural table (DST) [1] with columns: am , K(am), as, K(a,), X,, Yh , Ф,, h . Here am is a current state of an FSM, am є A ; K(am) is a binary code of the state am with R = |"log2 M ] bits, the internal Boolean variables qr є Q = {,... , qR} are used to encode states am ; as is the next state, as є A ; K(as) is a code of the state as, K(as ) = K(am ) for s = m ; Xh is a condition of transition (am,as), it consists from conjunction of affirmation or negation of some logic elements from the set X ; Yh is the microinstruction (pI) which is formed during the transition (am,as), Yh c Y; Фh is the set of memory excitation functions that are equal to 1 to switch an FSM from K (am) to K(a,), Фh сФ = {,...,Dr} as a rule D flip-flops are used to form a memory; h is a number of the DST line, h = 1,...,H .
(5)
III. Base Structures of FSM Logic Circuit
The DST table is used as the base to form the system of functions:
Y = Y (Q, X),
Ф = ф( X)
This systems corresponds to functions (3) and (2) and it describes a single-level circuit of Mealy FSM (Fig. 1). This structure is called P Mealy FSM. Here the circuit P implements system of functions (5), the register RG represents the memory of FSM. One of the drawbacks of the structure P is a big number of p-functions:
np (P)= N + R . (6)
one of the known methods of decreasing this parameter is an encoding of microinstructions [2]. Let DST contain T different microinstructions Yt c Y . Assign to each set Yt the binary code K (Yt) with Rt = [log2 T ] bits (t = 1,...,T). Use
variables zr є Z = {zl,...,zR^} for representation of these
codes. In this case a Mealy FSM can be implemented as double-level circuit (Fig. 2) named as PY Mealy FSM [2]. The register RG is exactly the same as in previous structure. The circuit Y implements the system of functions:
Y = Y (Z ) (7)
and transforms the code K(Yt), represented by variables zr ,
into the microinstruction Yt , built from microoperations yn .
This circuit can be implemented using embedded memory blocks. Now the circuit P implements systems:
Z = Z (Q, X)
ф = ф( X)
(8)
and the number of p-functions is decreased to:
np (PY)= N + R,. (9)
But this number is still relatively big. It means that such a structure needs still relatively big number of LUTs for implementation of the circuit P. It makes that application of this structure in a process of an FPGA implementation is not grateful.
IV. Main Idea of Methods
The idea of further improvement is to encode also the next state (internal state) using the code of a microinstruction or the code of a current state as partial code [3].
A. Multiple Encoding with use of Microinstruction Code Let divide set of internal states into subsets based on a currently executed microinstruction Yt . It leads into existence of T subsets A(y, ) c A and state as є A(Yt) iff it is the state of a transition when the microinstruction Yt is executed. Let B, = \A(Y,) and B0 = max(Bj,..., BT). Encode internal states as from each subset A(Yt) separately by the binary code Kt (as) with R2 = |"log2 B0 ] bits. This code is represented by variables тг є T = Tj,...,tr2}. In this case the code of the internal state K (as) is represented by the concatenation of multiple code of the internal state Kt (as) and the code of the microinstruction K (Yt):
K (a, )= Kt (a,) K ( ). (10)
A digital circuit of an FSM with such an encoding can be implemented as a double-level circuit named as PYY Mealy FSM (Fig. 3). The circuit Y implements the same system (7) like for PY Mealy FSM. The circuit P implements system:
z = z Q, x ), t = tQ, x),
(11)
in this case. There is used additional circuit CC in this structure. It is used for decoding internal state and it implements the system:
Ф = ф(, т).
(12)
Because this circuit has regular structure it can be implemented using embedded memory blocks.
This structure permits further reduction of the number of p-functions to:
np (PYY) = R2 + R (13)
Ф = Ф(Q, T) (15)
in comparison with PYY Mealy FSM. The circuit P implements system (11), like for the structure PYY, and the circuit Y implements system (7), like for structures PY and PYY.
This structure also permits reduction of the number of p-functions to:
np (PAY) = R3 + R (16)
in comparison with the PY Mealy FSM. The rule of realization in FPGA structure is the same as for the structure PYY - the combinational circuit P is implemented with LUTs and both decoders CC and Y are implemented with use of embedded memory blocks.
It is very hard to calculate relation between np (PYY) and np (PAY) because values of these parameters are strongly
in comparison with the PY Mealy FSM. It makes that also a number of LUTs required for implementation of the circuit P is reduced and both decoders Y and CC can be implemented with memory blocks what makes that FPGA resources are used in balanced way.
connected with FSM parameters (like number of states, number of microinstructions, etc.) of implemented control algorithm and it means that the structure should be selected individually for each case.
B. Multiple Encoding with use of State Code The method where the code of a current state is used as the partial code of a internal state is very similar to the previous one. In this case the set of internal states is divided into subsets based on a current state am . It leads into existence of M subsets A(am )c A and the state as є A(am) iff it is the state of the transition from the state am . Now, by analogy to the previous method, Cm = |A(am) and C0 = max(Cj,..., CM) and internal states are encoded by the binary code Km (as) with R3 = [log2 C0 ] bits. In this case the code is represented by variables тг є T = {t1,k,zr } and the code of the internal state K(as) is represented be the concatenation of the multiple code of the internal state Km (as) and the code of the current state K (am):
K (as ) = Km (as )* K (am ). (14)
Digital circuit of FSM with this encoding can be implemented as a double-level circuit named as PAY Mealy FSM (Fig. 4). In this structure only the circuit CC implements the different system:
V. Method of Synthesis and Implementation
The special method of synthesis for designed structures
(Figs. 3 and 4) is proposed. This method includes following
steps:
1) Creation and encoding of microinstructions. This step is based on a trivial way of binary encoding. Each microinstruction is assigned a binary code with a value corresponding to the value of its index decreased by 1. So, values of codes are from 0 to T -1.
2) Division of the set of internal states. The set of internal states is divided into T or M subsets. Each subset contains only states that are states of a transition during executing the t-th microinstruction or from the m-th state.
3) Multiple encoding of internal states. Internal states are binary encoded separately in each subset. So, values of codes are from 0 to B0 -1 or from 0 to C0 -1. Each state as can be assigned several different codes, one in each subset A(Yt) or A(am).
4) Formation of DST of PAY Mealy FSM or PYY Mealy FSM. This table is formed from the original DST by replacing the column Yh with the column zh and
5)
6)
columns K(as) and Фh with columns Kt (as) (Km (as)) and Th. The column Zh contains variables that are equal to 1 in the code K(Y,). The column Kt (as) (Km (as)) contains the multiple code of the internal state. The column Th contains variables zr that are equal to 1
in the code K, (as) (Km (as)).
Formation of microoperations decoder table. This table contains columns K(Yt), Yt, t. The column K(Yt) contains the binary code of the microinstruction from the column Yt. The column Yt should be written in a binary format. The column t is a number of the line, t = 1,...,T . Formation of internal state code converter table. This table contains columns Kt (as) (Km (as)), K (Y,)
( K (am )X K (as )> i . The column Kt (as ) ( Km (as ) )
contains the multiple code of the internal state as for the
t-th microinstruction (the m-th state). The t-th microinstruction (the m-th state) is represented by the code from the column K (Y,) (K (am)) and the internal
state as is represented by the code from column K(as).
T
The column i is a number of the line, i = 1,...,^Bt
t=1 M
( І = 1,K, X Cm ).
m=1
7) Formation of logic equations of the circuit P. These equations form systems Z and T. They are formed basing on the DST of PAY Mealy FSM or PYY Mealy FSM.
8) Implementation of the logic circuit of PAY Mealy FSM or PYY Mealy FSM. The combinational circuit P and the register RG are implemented with CLBs of an FPGA -the circuit P with LUTs and the register RG with D flip-flops.
The circuit Y is implemented with memory blocks, where K (Yt) is an address and Yt is a word from this address. The contents of this memory is described by the microoperations decoder table.
The circuit CC is also implemented with memory blocks
VCC
Fig. 6. Schematic diagram of PAY Mealy FSM
where an address is formed as concatenation of Kt (as) and K(Yt) (or Km (as) and K(am)) and the value of a word for this address is K(as). The contents of this memory is described by the internal state code converter table.
Schematic diagrams of the PYY Mealy FSM logic circuit (Fig. 5) and the PAY Mealy FSM logic circuit (Fig. 6) are based on the architecture of Xilinx Virtex FPGAs [9] but they can be easy adopted to FPGAs of other vendors because all logic elements, especially LUTs and memory blocks, and their connections are very similar.
The clock signal for memory blocks is the same as for the register but memory blocks are trigged by opposite edge (in this case falling edge). It cause that data are ready to read after one cycle and there is no need to wait one clock cycle until data are stable. It is especially important when an internal state is encoded. It also means that memory blocks also works as an output register in case when microoperations are encoded.
VI. Automata Synthesis System
There was created the Automata Synthesis (A AS) System [12] to perform the logic synthesis of FSMs with use of designed structures. The software was created in Borland C++ Builder and it works in batch mode under Windows XP operating system.
The input for the AaS System is an FSM described in a KISS2 format [8]. As output there is generated the set of files. These files represent the structural description of a selected type of an FSM in Verilog HDL [7]. The combinational circuit is described by the set of logic equations using the assign statement. The content of memories is described using the case statement. Because it should by synthesized as synchronous ROM memory this statement is placed in the always block with the falling edge of the CLK signal on the sensitivity list. The address is placed as a selector of the case statement and the content of the memory is described by choices of the case statement. To ensure that such described module will be synthesized as a memory block there is required to set a value of special synthesis attribute bram_map to “yes” [10]. This is synthesis attribute of Xilinx devices and it is ignored in case of synthesis into FPGA devices from other vendors, But each vendor supplies similar attributes or directives, e.g. the romstyle synthesis attribute for Altera
TABLE I
Synthesis Results
Benchmark Type of Structure
resources P PY PAY PYY
Slices 20 16 9 10
ex4 LUTs 36 29 16 18
FFs 4 4 4 4
BRAMs 0 1 2 2
Slices 29 19 24 15
ex6 LUTs 52 34 43 27
FFs 3 3 3 3
BRAMs 0 1 2 2
Slices 51 56 37 49
keyb LUTs 90 99 65 86
FFs 5 5 5 5
BRAMs 0 1 2 2
Slices 22 16 14 9
LUTs 39 29 24 16
opus FFs 4 4 4 4
BRAMs 0 1 2 2
Slices 141 90 64 75
planet LUTs 248 1 55 113 131
FFs 6 6 6 6
BRAMs 0 2 3 5
Slices 529 628 238 398
s298 LUTs 951 1143 433 719
FFs 19 26 13 19
BRAMs 0 1 5 3
Slices 113 115 89 84
sand LUTs 199 205 156 148
FFs 5 5 5 5
BRAMs 0 1 2 2
Slices 112 1 09 87 90
styr LUTs 199 1 92 155 160
FFs 5 5 5 5
BRAMs 0 1 2 2
Slices 55 46 26 26
tma LUTs 97 80 45 45
FFs 5 5 5 5
BRAMs 0 1 2 2
FPGAs [11]. Then these files can be the entry point for further synthesis and implementation into selected FPGA device (Fig. 7).
The system also generate a report file where the number of logic functions and the size of memory is calculated. The macro to run third party synthesis (at this version of A A S only Xilinix XST is supported) can be also generated.
The implemented methods of synthesis were tested using benchmarks from the LGSynth91 library [8]. The results of synthesis for selected FSMs are presented in Table 1. It can be saw that one of proposed methods of synthesis (PAY or PYY)
Structural and Logic Description (Verilog)
—Implementation»
Bitstream
Device Library
Fig. 7. Design flow for FPGAs with AaS
always gives better results (in bold) that standard ones (P and PY). The analysis of all benchmarks shown that the application of the structure PAY reduce the number of required LUTs by 33% in comparison with the structure P and by 30% in comparison with the structure PY in average. The gain for the structure PYY is 19 and 16% respectively. Obtained results show that PAY Mealy FSM gives rather better results than PYY Mealy FSM but in some cases implementation of PYY structure gives more benefits. Which method is better depends only on characteristic of currently implemented control algorithm.
Proposed methods required more memory bits than PY Mealy FSM. But new FPGA devices has embedded memory blocks with huge capacity and these blocks can be used for implementation of Y and CC circuits.
VII. Conclusion
The proposed in this article methods of synthesis and implementation of Mealy FSMs with the multiple encoding of internal states based on a current state or a currently executed microinstruction permit to decrease the number of logic elements required for implementation of the combinational circuit of an FSM. The realization of decoders with use of memory blocks allows to utilize different kind of resources that are available in new FPGA devices. It leads to balanced utilization of device resources in a synthesis process of a control units.
There was created the Automata Synthesis System for verification of proposed methods of synthesis. The obtained results shows that designed methods reduce the number of LUTs required for implementation of the combinational circuit of an FSM. Analyzed benchmarks shown that these methods are better than the standard ones. The gain very strong depends on parameters of considered control algorithm and the selection of a structure should be made individually for each control algorithm.
References
[1] S. Baranov, Logic Synthesis for Control Automat. Boston: Kluwer Academic Publisher, 1994.
[2] A. Barkalov, and M. W^grzyn, Design of Control Units with Programmable Logic. Zielona Gora: University of Zielona Gora Press, 2006.
[3] A. Bukowiec, A. Barkalov, and L. Titarenko, “FSMs implementation into FPGAs with multiple encoding of states” in Proc. IEEE East-West Design & Test Symposium, Lviv, Ukraine, 2008, pp. 72-75.
[4] J. Jenkins, Designing with FPGAs and CPLDs. New Jersy: Prentice Hall, 1994.
[5] T. Luba, M. Rawski, and Z. Jachna “Functional decomposition as a universal method of logic synthesis for digital circuits” in Proc. 9th Int. Conf. Mixed Design of Integrated Circuits and Systems,Wroclaw, Poland, 2002, pp. 285-290.
[6] Z. Salcic, VHDL and FPLDs in Digital Systems Design, Prototyping and Customization. Boston: Kluwer Academic Publishers, 1998.
[7] D. Thomas, and P. Moorby, The Verilog Hardware Description Language. Norwell: Kluwer Academic Publishers, 2002.
[8] S. Yang, “Logic Synthesis and Optimization Benchmarks User Guide. version 3.0,” Microelectronics Center of North Carolina, Tech. Rep. 1991-IWLS-UG-Saeyang, Jan. 1991.
[9] Virtex 2.5V Filed Programmable Gate Arrays Data Sheet, Xilinx, San Jose, 2001.
[10] XST User Guide (8.1i), Xilinx, San Jose, 2005.
[11] Design and Synthesis vol. 1 of Quartus II Development Software Handbook (v8.0), Altera, San Jose, 2008.
[12] A. Bukowiec. (2008, May). Automata Synthesis System [Online]. Available: http://willow.iie.uz.zgora.pl/~abukowie/AS/as.htm