Optimization of Control Unit with Code Sharing
Aleksander A. Barkalov, Member, IEEE, Larysa A. Titarenko, Aleksander N. Miroshkin
Abstract — The new design method for compositional microprogram control units with code sharing is proposed. The method targets on reduction in the number of PAL macrocells in the combinational part of control unit. Some additional control microinstructions containing codes of the classes of pseudoequivalent chains are used for operational linear chains modification. Proposed method is illustrated by an example. Various graph-scheme of algorithm (GSA) research results are illustrated with the diagrams. Most desirable GSA characteristics for using proposed method were obtained.
Index Terms — Circuit synthesis, flow graphs, logic devices, minimization methods.
I. Introduction
A control unit (CU) is one of the important blocks of any digital system [1]. The problem of hardware amount reduction is an important problem connected with implementation of logic circuits of CUs [2]. Peculiarities of a control algorithm to be implemented as well as logic elements in use should be taken into account to solve this problem. In this article we propose a method of this problem solution in case when a linear control algorithm is implemented using complex programmable logic devices (CPLD). We discuss the case when macrocells of programmable array logic (PAL) and embedded memory blocks (EMB) are used in a CPLD chip [3, 4]. In a linear algorithm there are more than 75% of operator vertices [5]. The compositional microprogram control units (CMCU) [5] are widely used for interpretation of linear algorithms. An approach based on existence of
pseudoequivalent operational linear chains (POLC) is proposed in [6, 7] for optimization of CMCU with code sharing [5]. But this approach does not decrease the hardware amount for a block of microoperations. The development of this approach is proposed in this article, which is based on coding of collections of microoperations [2].
The aim of this research is CMCU logic circuit optimization due to introduction in the format of microinstruction the special fields with codes of classes of POLCs and collections of microoperations.
The task of research is development of synthesis method allowing decrease for the numbers of macrocells PAL and
Manuscript received March 7, 2009. Optimization of Control Unit with Code Sharing.
A. A. Barkalov is with University of Zielona Gora, Poland. E-mail: [email protected]
L. A. Titarenko is with University of Zielona Gora, Poland.
E-mail: [email protected]
A. N. Miroshkin is with Donetsk National Technical University, Donetsk, Ukraine. [email protected]
blocks EMB in the logic circuit of CMCU. A control algorithm is represented by a graph-scheme of algorithm (GSA) [8, 9].
II. Analysis of CMCU with code sharing
Let a control algorithm to be interpreted be represented by a graph-scheme of algorithm (GSA) Г [9]. Let this GSA be characterized by the set of vertices B = {b0 ,bE } u E1 u E2 and the set of arcs E , where b0 is an initial vertex, bE is a final vertex, E1 is a set of operator vertices, and E2 is a set of conditional vertices. Each operator vertex bq e E1 contains a collection of microoperations Y (bq) c Y , where
Y = {y1,..., yN} is a set of data-path microoperations. Each conditional vertex bq e E2 contains some element e X, where X = {x1,..., xL} is a set of logical conditions (input signals). A GSA Г is named a linear GSA [5] if the number of its operator vertices exceeds 75% of the total their number in the GSA.
Let the set C = {a1,..., aG} be constructed for GSA Г, where ag e C is an operational linear chain (OLC) [5]. Any component bg, of OLC ag e C belongs to the set E1 (i = 1,..., Fg). Each pair of adjacent components bg,, bg, 1 corresponds to the arc < bgi, bgi+1 >e E, where i = 1,..., Fg -1, g = 1,..., G. Each OLC ag e C has only one output Og and the arbitrary number of inputs. Formal
definitions of OLC, its input and output can be found in [5]. Each vertex bq e E1 corresponds to microinstruction MIq
kept in the cell of control memory (CM) with address Aq . It is enough
R = Г log 2 m 1 (1)
bits for microinstruction addressing, where M = |E^ . Let each OLC a e C include Fg components and
Q = max(F1,..., FG). Let each OLC ag e C be encoded by binary code K(ag) having
R =N2 Gl (2)
bits and variables Tr ez be used for such an encoding, where T = R1. Let each component bq e E1 be encoded by binary code K (bq) having
R2 =N2 Ql (3)
R&I, 2009, №2
39
bits and variables Tr e T be used for this encoding, where T| = R2 . The encoding of components is executed in such a manner that condition
K (bgM) = K (bgi) +1 (4)
takes place for each OLC ag e C (i = 1,...,Fg -1). If condition
R + R2 = R (5)
takes place, then the model of CMCU with code sharing U1 can be used for interpretation of GSA Г (Fig. 1).
In CMCU U1 , a block of microinstruction addressing (BMA) implements the system of input memory functions for counter CT and register RG:
Ф = Ф(т, X);
(6)
T = T(x,X).
Let us point out that in the case of CMCU U1 an address of microinstruction is represented as the following one:
A(bq) = K(ag)*K(bq), (7)
where bq is a component of OLC ag e C and “*” is a sign of concatenation. The CMCU U1 operates in the following order.
If Start = 1, then an initial address (all zeros) is loaded into RG and CT. In the same time a flip-flop TF is set up which causes Fetch = 1, then microinstructions can be read out of control memory. Each cell of CM keeps microoperations yn e Y and special variables y0 and yE . If y0 = 1, then a current content of CT is incremented, otherwise both CT and RG are loaded from BMA. The first case corresponds to transition from any OLC component except of its output. The second case corresponds to transition from an OLC output. If yE = 1, then the flip-flop TF is reset, signal Fetch = 0 and operation of CMCU is terminated. It corresponds to transition from the vertex bq e E1, where < bq, bE >e E. Pulse Clock
is used for timing of CMCU.
Let us point out that OLC at ,aj e C are pseudoequivalent
OLC [5] if their outputs are connected with input of the same vertex of GSA Г. The hardware amount in logic circuit of BMA can be decreased due to introduction of a special block
for transforming the OLC codes into the codes of the classes of pseudoequivalent OLC named as a code transformer (TC) [5]. But the TC consumes some resources of the chip in use.
In this article we propose to use free cells of CM for this transformation. To reduce the number of EMB in the control memory, we propose to use the maximum encoding of collections of microoperations [2].
III. Main idea of proposed method
Let C1 <z C be a set of OLC such that their outputs are not connected with the vertex bE . Let us find the partition Р C = {B1,...BI} of the set C1 by the classes of POLC. Let us encode classes Bi enc by binary codes K(Bi) having RB bits, where
Rb =N211. (8)
Let us use variables vr e V for this encoding, where
V = Rb .
In the process of CMCU synthesis, an initial GSA Г is transformed and additional variables y0 and yE are introduced in its operational vertices. Thus, the initial set Y is transformed in the set YC = Y u {y0, yE } . Let the set YC includes Q1 different collections of microoperations (CMO). Let us encode each collection Yq by a binary code K (Yq )
having RY bits, where
Ry = flog2 Q11. (9)
Let us use variables z r e Z for this encoding, where Z = Ry . In this case the control memory includes two blocks
[5], namely a block of micromemory (BMM) and a block of microoperation (BMO). The BMM generates functions
Z = Z(T,t) , (10)
and the BMO generates variables
Yc = Yc (Z). (11)
In this article we propose to include the fields K(Bi ) and K (Yq ) in the microinstruction format. These
microinstructions include RI bits, where
Ri = Rb + Ry . (12)
Both BMM and BMO are implemented using EMBs having t outputs. Assume that each EMB includes q words and
q > max(M,Q1). (13)
The block BMM has RY outputs and it is implemented using n1 blocks EMB, where
n1
rl
t
(14)
In this case, there are R3 free bits in the word of BMM, where
R3 = n1t - Ry . (15)
These free bits can be used for keeping of some part V 1 of the code K(Bi) .
40
R&I, 2009, №2
All bits of K (Bt) are generated by the BMM if the following condition takes place:
R > Rb . (16)
Otherwise, the block of code transformer (BCT) is used to generate the rest of the bits, R4, where
R4 = Rb - Rb- (17)
These bits form a part V2 of the code K(Bt). This approach leads to a CMCU U2 (Fig. 2).
Fig. 2. Srtuctural diagram of CMCU U 2
In CMCU U2 , the block BMA implements functions
Ф=Ф^,X), (18)
'¥ = '¥(V, X), (19)
and the block BCT implements functions
V2 = V2 (t) . (20)
The following conditions take places:
V1 u V2 = V , (21)
V1 n V2 =0 . (22)
Functions of other blocks have been already discussed. Let us point out that logic circuits of BMA, CT, RG and TF are implemented using PAL macrocells, whereas circuits of BMM and BMO using EMBs. Logic circuits of BCT can be implemented using either PAL macrocells or EMBs.
In this article the following synthesis method is proposed for the CMCU U2 :
1. Construction of sets C , C1 and Р C for GSA Г.
2. Encoding of OLCs, their components and classes.
3. Encoding of collections of microoperations Yq c YC .
4. Construction of control memory contents for blocks BMM and BMO.
5. Construction of CMCU transition table.
6. Construction of BCT table.
7. Logic synthesis of CMCU logic circuit.
IV. Application of proposed method
Let a GSA Г1 be represented by the sets C = {a1,...,a8}, where a8 g C1, and ПC = {B1,...B5} , where B1 = {a1} , B2 = {a2 ,a3} , B3 = {a4 ,a5}, B4 = {a6 }, B5 = {a7 }, a1 =< b1,b2,b3 >, a2 =< b4,...,b7 >, a3 =< b8,b9 >,
a4 =< b10 , b11, b12 > , a5 =< b13 ,..., b16 > , a6 =< b17 ,..., b19 >,
a7 =< b20 , b21 > , a8 =< b22 , b23 , b24 > . Therefore, we can get the following values and sets: number of OLC G = 8, for their encoding we use R1 = 3 variables from the set т = {t1,t2 ,t3 }, maximum OLC length is Q = 4 vertexes, for their encoding R2 = 2 variables from the set T = {T1,T2} is enough, number of operational vertices in the GSA M = 24 , R = 5 bits are necessary for their encoding. Hence, condition (5) takes place and there is possibility to use the code sharing. It is enough RB = 3 variables for encoding of the classes Bt e ПC . It means that V = {v1,v2, v3}.
Let us encode OLC ag e C and their components in the following way: K (a1) = 000, ..., K (a8) = 111,
K(B1) = 000, ..., K(B5) = 100. To satisfy the condition (4), let the first component of each OLC ag e C have code 00,
the second 01, the third 10, and the fourth 11. It leads to microinstruction addresses A(bq) shown in Table 1.
table 1
Microinstruction Addresses for CMCU U 2Г1)
Address 000 001 010 011 100 101 110 111
00 b1 b4 b8 b10 b13 b17 b20 b22
01 b2 b5 b9 b11 b14 b18 b21 b23
10 b3 b6 * b12 b15 b19 * b24
11 * b7 * * b16 * * *
From Table 1 we can derive, for example, that
A(b5) = 00101, A(b15) = 10010 , and so on. Replacement of vertices by corresponding collections of microoperations in Table 1 results in the content of control memory (Table 2).
table 2
Control Memory Content for CMCU U 2Г1)
Address 000 001 010 011 100 101 110 111
У0, У0, У0, У0, У0, У0, У0, У0,
00 Уъ Уз, Уъ Уз, Уз, У^ Уз, Уз,
У 2 У 5 У2 Уб У5 У2 Уб У 9
01 У0, У0, У1, У0, У0, У0, У0,
Уз, Уз, У 7 Уз, Уз, Уз, У8 Уз,
У9 У 9 У9 У9 У5 Уб
У0, У0, У1,
10 У 4 Уз, * У8 У1, У4 * У2,
Уб У 2 У1, Уб
11 * У 8 * * * * *
У 7
Obviously, collections of microoperations are taken from the GSA Г1, but we do not show it. As follows from T able 2, the control memory includes Q1 = 8 collections of
R&I, 2009, №2
41
microoperations, namely: Y1 = {y0,y1,y2} , Y2 = {y0,y3,y9} ,
Y3 = {y 4 } , Y4 = { Уо , Уз, Уз } , Y5 = { Уо , Уз, y 6 } , Y6 = {Уl, Уг } , Y7 = {y8} , Y8 = {y1, y2, yE } . They can be encoded using Ry = 3 variables, therefore Z = {z1, z2, z3} .
Let EMB in use have t = 2 outputs, then number of used EMB n1 = 2. Number of non-used bits R3 = 1. It means that one bit of the code K (Bi) can be generated by the block BMM. Let variables vr e V be devided between V1 and V 2 in the following way: V1 = {v1} , V2 = {v2,v3} .
It is enough to replace the collections in Table 1 by their codes to specify the block BMM. Each output of OLC ag e Bj is complemented by value of the first bit of code
K (Bi). In our example, the block BMM is represented by Table 3, and the variable v1 is included in the output of OLC
a7.
TABLE 3
Content of Block BMM for CMCU U 2Г1)
clear from Table 5. The number of such a table rows H is determined by the number of terms in system of generalized formulae of transitions. In our case we have H = 11.
TABLE 5
Fragment of Transitions Table for CMCU U 2Г1)
Bt K (Bt) bq A(bq ) Xh ф h 'Fh h
b17 10100 x2 - D1 D3 6
B3 010 b20 11000 x2 x3 - D1 D2 7
OO 10101 x2 x3 D5 D1D3 8
This fragment describes the transitions for class B3 , starting from the sixth term of system (23). The table of transitions is used to derive functions (18)-(19), having the following terms
(Rb , \
■ Xh (h = 1,...,H). (24)
Fh =
л v;
r=1
In system (24), the symbol lrh stands for value of the bit r
Address 000 001 010 011 100 101 110 111 j rh
00 000 001 000 011 100 000 100 001 v° = vr , vlr = vr (r = 1,..,Rb). For example, the fol
01 001 011 101 100 001 011 110 v1 100 system can be derived from Table 5:
10 010 100 * 110 000 010 * 111 D1 = F6 v F7 v F8 = v1v2 v3 i
11 * 110 * * 101 * * * D2 = F6 v F8 = v1v2 v3 x2 v v1v2 v3 x2 x3 i D3 = F8 = v1v2 v3 x2 x3 .
The block BMo is specified by a table with columns K(Yq), Yq, q . This table is constructed in a trivial way
(Table 4).
The table of BCT includes columns ag
2
Bi
V K(ag),
K (Bj), V2 . In our example, Table 6 represents the block BCT.
TABLE 4
Content of Block BMO for CMCU U 2Г1)
K (Yq ) Yq q K(Yq ) Yq q
000 Уо , Уl, У2 1 100 Уо , Уз, У6 5
001 Уо , Уз, У9 2 101 Уl, У7 6
010 y4 3 110 У8 7
011 Уо , У3, У5 4 111 Уl, У2, yE 8
To construct the table of transitions for CMCU U2 , it is necessary to construct the system of generalized formulae of transitions [4] for classes Bj e ПC . Let the following system exist for our example:
B1 ^ x1b4 v x1b8;
B2 ^ x3b10 v x3x4b13 v x3 x4b17 ; (23)
B3 ^ x2b17 V x2x3b20 V x2 x3b18 ;
B4 ^ b20 ; B5 ^ x1b22 V x1b11.
Such a system is the base for construction of CMCU transition table including the following columns: Bi, K(Bi), bq, A(bq), Xh , Фh , '¥h, h . The purpose of each column is
table 6
Specification of Block BCT for CMCU U 2Г1)
ag K (ag) Bi K (Bt) V2 8 g
a1 000 B1 000 - 1
a2 001 B2 001 v3 2
a3 010 B2 001 v3 3
a4 011 B3 010 v2 4
a5 100 B3 010 v2 5
a6 101 B4 011 v2v3 6
a7 110 B5 100 - 7
a8 111 B6 - - 8
Remind that the variable v1 is generated by the block BMM. In the same time, there is no code K(B6 ) because a8 g C1. Obviously, this table specifies blocks EMB. If the
logic circuit of BCT is implemented using PAL macrocells, then Table 6 corresponds to Karnaugh maps for function
vr e V2. To optimize system (20), we should encode OLC ag e C1 in the optimal way. The well-known method
42
R&I, 2009, №2
ESPRESSO [1], for example, can be used for such an encoding. We do not discuss this task in our article.
Implementation of the logic circuit of CMCU U2 is reduced to implementation of systems (18)-(19) using PAL macrocells, and tables similar to Table 3, Table 4, and Table 6 using EMB. To solve this task, a designer can use either standard tools [4] or some known methods [8]. We do not discuss this step also.
Let us point out that the control memory of CMCU и1(Г1) includes 32*12=384 bits (if t = 2), and CMCU transition table includes 17 lines. In the CMCU U2(Г1), the BMM includes 32*4=128 bits, the BMO requires 8*11=88 bits, and the BCT consumes 8*2=16 bits. Therefore, the control memory of CMCU U2(Г1) uses 232 bits of memory, and its transition table has H = 11 lines. It means that the CMCU U2(Г1) requires 1.5 times less of the memory, and its block BMA includes 1.54 times less amount of terms.
V. Conclusion
References
[1] De Micheli G. Synthesis and Optimization of Digital Circuits. -NY: McGraw-Hill, 1994. - 636 pp.
[2] Barkalov A.A., Wegrzyn M. Design of Control Units with Programmable Logic. - Zielona Gora: UZG Press, 2006. - 150 pp.
[3] Macrocell Configurations in CoolRunner XPLA3 CPLDs. http://www.xilinx.com/support/documentation/application_notes/xa pp335.pdf
[4] Kania D. Two-level logic synthesis on PALs // Electronic Letters. -
1999, № 7. - pp. 879 - 890.
[5] Barkalov A., Titarenko L. Logic Synthesis for Compositional Microprogram Control Units - Berlin : Springer, 2008. - 272 pp.
[6] Barkalov A.A., Kovalyov S.A., Bieganowski J., Miroshkin A.N. Synthesis of control unit with code sharing and modified linear chains. Machinebuilding and Technosphere XXI // Proc. of XV Int. Scientific Conf. Sevastopol 15-20 September 2008. - Donetsk: DonNTU, 2008. Vol. 4. - P. 54-59. (in Russian).
[7] Barkalov А.А., Krasichkov А.А., Miroshkin A.N. Control Device Synthes with code devising and modification of operator line chains. Sc. Trans. of Donetsk National Technical University. Series "Informatics, Cybernetics and Calculate Techniques". Issue 9 (132) - Donetsk: DonNTU. - 2008. - P. 183-187. (in Russian).
[8] Solovjev V.V. Digital circuit design with CPLD. - Moscow: Hot Line-Telecom, 2001. - 636 pp. (in Russian)
[9] Baranov S. Logic Synthesis for Control Automata - Boston: Kluwer Academic Publishers, 1994 - 312 pp.
In this article we propose the method oriented on decrease for the number of macrocells in the logic circuit of CMCU. The method is based on including the field with code of class of pseudoequivalent OLC into the microinstruction format. The size of CMCU control memory is decreased too due to maximal encoding of collections of microoperations. To decrease the number of macrocells in the block of microinstruction addressing, the special code transformer is used. It transforms OLC codes into codes of their classes. This block can be absent it condition (16) takes place. In this case, the transformation is executed by CMCU block of micromemory.
But such an approach leads to the CMCU U 2 with less performance than this characteristic of CMCU with code sharing. Let us point out that reduction of the number of the macrocells in logic circuit can result in decrease of its levels. It can compensate the negative effect of the memory splitting by two blocks. We made some examples of synthesis using the standard package WebPack. The results show that the number of macrocells is decreased up to 30%, and the number of required memory blocks are decreased up to 50%. Comparison is given for CMCU U1 and U2 . In the same time, the number of levels in logic circuit of CMCU U 2 is decreased up to 2-3. Let us remind, that the proposed method can be applied only for linear GSA, when condition (5) takes place.
The scientific novelty of proposed method is determined by use of the classes of pseudoequivalent OLC and free resources of EMB for decreasing the number of macrocells in block of microinstruction addressing. Besides, application of encoding of collections of microoperations allows decrease for required memory resources. The practical significance of the method is determined by decrease for the number of macrocells and EMB in CMCU logic circuit, It allows to design the circuits with less amount of hardware in comparison with known control units oriented on linear GSAs.
Aleksander A. Barkalov - Doctor of Science, Professor of DonNTU (Ukraine), Professor of University of Zielona Gora, Poland.
Dr. Barkalov’s scientific interests: digital control units, SoPC Address: Campus A, Budynek Dydaktyczny / A-2
prof. Z. Szafrana str. 2, 65-516 Zielona Gora E-mail: [email protected]
Larysa A. Titarenko - Doctor of Science, Professor of Kharkiv National Univercity of Radioelectonics (KNURE), Professor of University of Zielona Gora, Poland.
Dr. Titarenko’s scientific interests: Digital,
adaptive and spatial-time processing of signals in telecommunication. Management and control in communication networks Research of modern digital telecommunication systems and nets.
Aleksander N. Miroshkin - Assistant of Donetsk National Technical University.
Scientific interests: digital control units.
R&I, 2009, №2
43