Научная статья на тему 'Large and very large size intractable combinatorial SoC and VLSI physical design problems: how to solve them with high quality?'

Large and very large size intractable combinatorial SoC and VLSI physical design problems: how to solve them with high quality? Текст научной статьи по специальности «Компьютерные и информационные науки»

CC BY
92
35
i Надоели баннеры? Вы всегда можете отключить рекламу.

Аннотация научной статьи по компьютерным и информационным наукам, автор научной работы — Roman Bazylevych

This paper summarizes the recent works of the author and his team in the methodology for solving the intractable combinatorial problems, which occur in VLSI and SoC physical design automation. The optimal circuit reduction method has proved to be an effective and efficient tool to identify the hierarchical cluster structure of the circuit. The author reviews the applicability of this method to solving a wide spectrum of various problems, including hierarchical clustering, partitioning, packaging, and placement. He develops a general approach to these problems based on the recursive use of high quality algorithms of global and local optimization for unique, not very large size problems. Experiments confirm the high effectiveness of this approach. For some well–known test cases the optimal results were achieved for the first time, while for many other cases improved results were obtained.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Large and very large size intractable combinatorial SoC and VLSI physical design problems: how to solve them with high quality?»

LARGE AND VERY LARGE SIZE INTRACTABLE COMBINATORIAL SOC AND VLSI PHYSICAL DESIGN PROBLEMS: HOW TO SOLVE THEM WITH HIGH QUALITY?

ROMAN BAZYLEVYCH

Software Engineering Department, Lviv Polytechnic National University, 12 S. Bandera Street, Lviv, 79013, Ukraine, e-mail:rbaz@polynet.lviv.ua

Abstract. This paper summarizes the recent works of the author and his team in the methodology for solving the intractable combinatorial problems, which occur in VLSI and SoC physical design automation. The optimal circuit reduction method has proved to be an effective and efficient tool to identify the hierarchical cluster structure of the circuit. The author reviews the applicability of this method to solving a wide spectrum of various problems, including hierarchical clustering, partitioning, packaging, and placement. He develops a general approach to these problems based on the recursive use of high quality algorithms of global and local optimization for unique, not very large size problems. Experiments confirm the high effectiveness of this approach. F or some well-known test cases the optimal results were achieved for the first time, while for many other cases improved results were obtained.

1. Introduction

Rapid growth of electronic circuit complexity requires a further search for new effective approaches to solve CAD problems. The methods for combinatorial optimization are especially important for VLSI and SoC physical design. In this case, partitioning, packaging, placement and routing are very important problems. Identical input data is the common peculiarity of these problems, i.e. circuit is formed by the set of elements, and the set of electric nets. Such general circuit structural peculiarities of studied problems favorthe feasibility of applying a common approach for solving the latter in the cases of large and very large sizes - hierarchical decomposition and multilevel macromodeling - which can be realized by identifying the cluster circuit structure - their hierarchical clustering. Each real circuit has clots - clusters, in which the elements are interconnected more densely than in other pieces. Small clusters are inserted into the larger ones, forming a hierarchical structure. The idea was to operate not by original elements, the number of which is extraordinarily high, but by clusters that could be mathematically described by macromodels. This enables us, first of all, to essentially decrease the size of the problem, facilitating a solution and reducing the calculation consumption, and, secondly, to considerably improve the quality of the solution, to more easy provide trapping into the zone of the global optimum. The search for solutions is more effective in the significantly smaller space of macromodels. When the circuit can be described by the macromodels with hierarchical structure, the large size problem can be reduced to recursive solving of small size tasks. All of them would be small in size, and are properly hierarchically inserted one into the other, and thus

can be solved by the same basic procedure. On each level of the hierarchy the problem is solved in two stages. At the first stage, a certain set of small size tasks of the same type is usually solved by high quality approximate, or even by exact methods on macromodels. At the second stage, the partial solutions from the local areas are joined together, and the local optimization is performed onthe problemwithall variables in a given level. We then proceed to the next level.

[1] Describes the basic concepts of such an approach. This approach was later developed for partitioning, packaging, and placement [2-14]. A transition to macromodeling is associated with forming the groups of elements with certain parameters. How should such groups be generated, and which parameters are decisive for their apportionment? On these questions there are no simple answers. It is obvious that they should be groups of elements, which are separated among themselves by a greater number of internal connections — in other words, more dense. Such groups are referred to as clusters. The problem of identifying the clusters in the circuit is very relevant with a view to following their use for various problems of automation design.

A transition to operating by clusters instead of by original elements allows us to significantly decrease the number of unknown variables and to obtain the better solution.

2. Hierarchical Circuit Clustering as a Precondition for Effective Solving of Intractable Combinatorial Problems of Large and Very Large Size

Many design automation problems can be handled by hierarchically built clusters instead of by original circuit elements. The Optimal (Parallel) Circuit Reduction (OCR) method [1,2] was proposed specifically for this problem. The main concept of such an approach is to build, at the first stage, the Optimal Reduction Tree TR, using the bottom-up strategy. It is a rooted tree, the leaves of which correspond to the original circuit elements of the set P and a root - to all of the grouped circuit. The reasoning behind this approach is to create a tree, the intermediate vertices of which correspond to the circuit clusters with more internal nets than others. In addition, it is useful to build such a tree so that the individual clusters are treated independently and in parallel - in other words, to proceed in contrast to the well-known “greedy” approach. In the latter they are treated in a serial manner, and therefore the first clusters are created and processed under more favorable conditions. With the “greedy” approach, the final created clusters are much worse than those created earlier in the partitioning process. Our approach is to remove this disadvantage by developing the conditions for parallel group formation. It favors the creation of the nearly natural clusters. The approach developed tends to identify the natural clusters in the circuit. This means that the tree TR could be considered the Hierarchical ClusterTree. It illustrates a natural hierarchy for the entrance of the smaller clusters into the greater ones. A computation complexity of the OCR method is close top the linear [12]. It confirms the po ssibility of using the OCR method for the large and very large size problems with millions of elements.

3. Partitioning

3.1. Hierarchical circuit decomposition

The hierarchical circuit decomposition canbe oftwo types: free and enforced. The former refers to clustering by the OCR method with the free merging of elements when the value of the

R&I, 2003, Ns 3

91

criterion is optimal. In this way, identified clusters are usually of different sizes. This type of decomposition most adequately reflects the natural circuit structure. It can be effective in choosing the most appropriate number ofpartitions into which it is necessary to divide the circuit for their subsequent construction in some given units. Any cut in the reduction tree separates the circuit into partitions. Determining the desired cut it is possible to define the most suitable partitioning. This is a significant distinction between the proposed approach and others, in which an important partitioning problem is solved “blindly”, without a proper foundation, without apprehension of the internal cluster circuit structure. This method facilitates choice of the best-grounded decomposition parameters and identification of its structure, taking into account the identified circuit cores. For example, if the circuit shows five clusters of approximately identical size, then it serves no purpose to break it up into four or six partitions. Other methods do not have this advantage; and select the circuit decomposition parameters without considering it a natural cluster structure, and this can bring about a substantial worsening in the solution quality. Very importantforsome problems inphysical designautomation is enforced partitioning, where the division coefficient are determined in advance. As input data we propose to make use of the hierarchical cluster structure, obtained by free clustering. Here the problem arises of packaging the clusters into blocks with given parameters. In such partitioning it is necessary for some big blocks to be divided into smaller units. The clusters of the reduction tree suggest how to do it.

Thus, in our approach, instead of operating with the original circuit elements, we recommend operating with clusters with an identified hierarchical structure of their mutual inserting. This significantly simplifies the problem and facilitates finding better solutions. Nevertheless, the problem does not remain simple, and that is why we recommend solving it in two stages. Using the constructive method, it is desirable to find an initial solution at the first stage, which must then be improved by the iterative method at the second stage. The important peculiarity of the approach described is that it is recommended to use the hierarchical circuit clustering, obtained by the OCR method at both stages.

3.2. Initial partitioning

It is appropriate to discuss two possible strategies of initial partitioning following our proposed approach — serial and parallel-serial [4,5]. The main conceptbehind the former strategy is successive partitioning by using the reduction tree. On this tree a vertex is found, whose number of elements is equal to or greater than the desired value, or which is formed as the first cluster. This vertex corresponds to the densest cluster with the desired number of elements, or best approaches this value. If the number of elements in this cluster is what we desire, we create the first partition and move forward to the next partitions. If this number is greater than that desired, then the problem is to remove the necessary number of elements. This problem is solved in the substantially narrowed solution space. That considerably simplifies the problem and allows us to apply methods that are suitable to deal with problems of small sizes. Here, it is appropriate to use the OCR method. The rest of the circuit is described by one vertex, while the cut off piece is decomposed and considered on the level of original elements, and a new reductiontree is built. Fromthis point two approaches are possible. In the first case, we use the traditional serial methods with modification, where we consider for removal not only the original elements, but with them all clusters of the new

reduction tree with the number of elements equal to or less than that desired for transfer. The problem recursively continues to its final solution. For improvement of the solution, a new reduction tree can be built at each successive iteration.

In the second case, we find a vertex on the new reduction tree whose number of elements best approaches the desired value, but is greater than it. The tree is cut, with the formation of a new piece, the number of whose elements already better approximates the desired value. Successively new trees are built, etc., until the solution with the desired number of elements is obtained. The problem is reduced to the recursive solution of identical problems on sets, which narrows at each successive step.

As our experiments show, the strategies described have a drawback, inherent to all serial (greedy) algorithms: the first separated partitions are of good quality, and those separated last are of bad quality. The first partitions are cut from the various best pieces of the circuit, forming “tails”, which are weakly connected one to another. At the final steps such “tails” create partitions with a large number of external and a small number of internal nets. How to remove this weakness of the serial strategy? For this purpose a parallel-serial strategy is offered [4]. This strategy performs a top-down, dichotomous circuit division with constraints on the number of elements, which should be multiples to the desired number of elements of one partition. In our first step we consider the two highest vertices. This determines the number of possible partitions that canbe formed from eachvertex and the number of elements in the remainders. The partition with fewer remaining elements is separated. The next step is to transfer the remaining elements from one piece to the other in the optimal way. This remainder is not large; it does not exceed half of one partition. Here one can use the arbitrary good serial method. The recursive approach described above using the reduction tree, from which it is necessary to eliminate redundant elements, is more appropriate. After the transfer of the desired number of elements from one piece to the other the problem is reduced to two new problems of the same type but of lesser size. In both cases, we use identical procedures to transfer a small number of elements from one piece to another, procedures that are performed recursively on sets that decrease from step to step.

In the parallel-serial algorithms it is appropriate to combine the procedures of the initial apportionment of strongly connected partitions as cores with the procedures of spilling weakly connected groups, and to follow the same procedures on the narrowed set of elements. It is also possible to use other algorithms here. The algorithms described can not guarantee arrival at the optimal solution. The choice of strategy is not simple, and that is why a further optimization of the initial solution is necessary. The described algorithms allow for a narrowing of the solution space significantly. In [1], the approach is described, which reduces the circuit to the level where, in terms of the number of elements, other high quality methods may be applied. Such an approach is rational when the reduction of circuit is sufficiently in “parallel” and clusters are close in size.

The main advantage of the suggested strategies for the initial solution are that at the initial stage all our algorithms conform to hierarchical circuit clustering, which identifies subcircuits that are considered to be partition cores. Such clustering is also used to solve partial problems atparticular stages of algorithms. Evaluating the computational complexity of these algorithms, we would like to point out that the number of basic operations

92

R&I, 2003, N 3

is proportional to the number of partitions formed and their size. Consequently, the computational complexity is near to linear in the number of original circuit elements, and thus these algorithms are applicable to problems of large and very large size.

3.3. Partitioning optimization

A new approach for initial partitioning was described in the section above. As any constructive approach, it should be used for improving the solution together with the optimizing algorithms. The new strategies offered here, like other optimizing algorithms, belong to the iterative ones. A substantial distinctionbetween the approach described below and others consists in the fact that, for most existing methods, an exchange between partitions is performed by separating the original circuit elements according to some strategy, and is thenfollowed by their exchange. Here we propose to effect the exchange by clusters of arbitrary size. To identify clusters for exchange, reduction trees are built for all initial partitions. The optimization is performed by pairs or by sets, which can include the arbitrary clusters and the original circuit elements. For exchange we consider all original circuit elements, as well as arbitrary clusters and element groups. The exchange is performed by the best pairs and groups of the original elements and clusters. After one such iteration, the reduction tree s are rebuilt for determining new components for exchange. If these partitions are more than two in number, then the order for their optimization becomes important. To determine this order a matrix of connectivity between partitions is analyzed and optimization is performed between independent pairs of partitions, in descending orderby the number oftheir common nets. The proposed approach optimizes in considerably wider domains of feasible solutions than traditional ones. The experiments confirm its high effectiveness [7].

4. Packaging

Packaging is a specific partitioning problem with strict constraints on the number of elements and external nets of each partition. Consequently, algorithms such as those above canbe used with some additional procedures, which conform to the constraints. The goal of packaging is to obtain the minimal number of partitions. Some possible strategies are discussed in [15].

Here we briefly describe one approach [8] that gave sufficiently good results. The algorithm begins to operate on the cluster of the reduction tree, which appears first in violation of the constraint onthe number of elements. Fromthis clusterwe form the first partition with as many as possible elements without constraint on the number of external nets violation. Two strategies are used: to remove the minimal number of elements and to identify the best cluster without violation of constraint. The next step consists of the addition of the maximum number of elements. The experiments reveal the advantage of simultaneous combination of both strategies that perform iterative removal and addition of elements and clusters. The partitions separated first have a good density; the final ones - bad. This is caused first of all by the “greedy” partitioning by serial strategy. As a result, the number of partitions can be greater than the optimum. To remove this drawback the following strategy is applied. The partitions with a number of elements less than the constraint merge into one or several without violating it. The next step is optimization on the set of all partitions that allows an increase in the number of elements, but not in excess of constraints on the first group of partitions, which were not subject to merging. Often such optimization

R&I, 2003, N 3

substantially decreases the number of external nets of final partitions even up to the desired value. If this is impossible to obtain, then the new final partition is divided into two smaller ones. The first partition should be without violation; the second may exhibit a violation on the number of external nets, if it is not possible to create it without violation, and so forth, up to the completion of the problem. The experiments confirm the high effectiveness ofthis approach on the set of some well-known test cases, considered in [15, 16]. The obtained results are not worse, and in 5 cases of 12 circuits they are the best among known and are optimal. If our results are not being theoretically optimal they are close to the optimal solutions and differ from them minimally, i.e. only by one partition (circuits c5315, s13207, and s38417) or two partitions (circuit c2670).

5. Floorplanning and Placement

Most completely hierarchical clustering and decomposition can be used for placement. Such an approach is especially effective forproblems of large and very large sizes. The problem is solved in several stages: bottom-up free hierarchical circuit clustering; mathematical description of macromodels; top-downplacement with multilevel global and local optimization. The determination of circuit clusters is the first stage; for this goal the OCR method is used. The aim of the second stage is to obtain the subcircuits and their subsequent mathematical descriptionby macromodels. The clusters formed are used for creating macromodels. These clusters usually have different sizes. This problem has the peculiarity that the sizes of macromodels can be considered “flexible” - “soft-macros”, because they contain internal elements, which could be placed in various ways. More appropriate here is the creation of macromodels of identical sizes that significantly simplifies the problemby reducing itto standard procedures, which easily could be formalized. The enforced hierarchical decomposition is formed for this purpose. Its structure is defined at the previous stage. The division coefficients at every level can be different. The geometrical zone descriptions and the number of macromodels should determine their choice [1].

At every partitioning level, the number of geometrical zones is taken as large as possible, in order to solve the problem more exactly. Augmentation of the number of macromodels at every level diminishes the number of levels and improves the solution quality, if one assumes that the problem gets an exact solution at every step [6]. At every level the choice of the largest division coefficient is appropriate. We suggest giving it a value not less than four. The maximum division coefficient is defined as the maximum number of elements, which can be placed by available procedures for an exact or nearly exact solution. The configuration of the geometrical zones is properly selected as most approximating a circle, facilitating the compact placement of strongly connected components. Such a form is ideal, because it provides the minimum summary length of internal nets within the zones. For some devices — for example, for the matrix VLSI with linear structure - such zones can correspond to rows, or to groups of rows. The lowest placement level corresponds to the original circuit elements.

It is appropriate to consider the procedure of division of placement area into geometrical zones only after hierarchical circuit clustering, and not before. The circuit clustering can give a useful indication of what sizes these zones should be. The circuit may contain unconnected pieces, which are identified by the formation of not one, but several reduction

93

trees. At the next stage, the macromodels are placed in geometrical zones. At each level the problem is considered in two substeps: the global macromodel placement in the determined geometrical boundaries, and local optimization within the entire surface without boundaries. In the second substep, the boundaries between geometrical zones are, as it were, erased, and the macromodels are allowed to shift within the entire surface. Any iterative method canbe applied here, but in our opinion, the most appropriate is the scanning area method, which confirms its high effectiveness [1]. Considerable attention was given to developing algorithms for finding the exact solution in problems involving a large number of elements [3,6,9-10]. The largest scanning area and crossing zones should be chosen for higher quality results. Recently we suggested a new stochastic scanning area algorithm that allows an increase in the size of problems [11]. In many cases, instead of full scanning of the entire surface it is sufficient to perform optimization only on the boundaries of macromodels. Our investigations showed that in a majority of cases two or three such iterations at each level of decomposition are sufficient.

Our experiments confirm the high effectiveness of the approach developed above. One of the most widespread placements is the Steinberg test case [17]. Already in 1979 we relatively easily obtained the result of 4131, which later was refined to 4119, due to an augmentation of the scanning area from 4 to 6 elements. For the linear and circle placement we obtained the results of 10287 and 10014 respectively [13]. These results, regretfully have not been demonstrated by other researchers. In our opinion, these are optima, though strictly mathematically speaking this is not easy to prove (the complexity is 36!). This view is prompted by the fact that these results were obtained by comprehensive statistic studies of the problem by various algorithms developed on the basis of the methodology described above, including stochastic ones, with a broad spectrum of parameters. It is puzzling why authors of other new placement algorithms do not consider this test case more often. Another interesting test case is T4000, that is formed of4002elements (http://www.twolf.com.). By applying our methodology, we obtained the result of 2741200, which is 10 % better than the known result [9].

6. Conclusions

The methodology of hierarchical clustering and decompositionby the OCR method, multilevel macromodeling, and rational combination of global and local optimization proved its high effectiveness and applicability for the solution of intractable combinatorial physical design problems of large and very large sizes at a high degree of quality and a near linear computational complexity. For all test cases, the results are not worse, and in many cases they are better then those

obtained by other known methods. For some cases, the optimal results were obtained for the first time.

References: 1. Bazylevych R.P. Decomposition and topological methods for electronic devices Physical Design Automation // Lviv: Vyshcha shkola, 1981, 168 P. (In Russian). 2. Bazylevych R.P., Tkachenko S.P. The partitioning problem solving by the parallel reduction method // Vychuslitelnaia technika: Materialy konferencii po razvitiiu technicheskich nauk po avtomatizirovannomu proektirovaniiu, vol. 7, Kaunas, 1975. PP. 295-298, (In Russian). 3. Bazylevych R.P., Telyuk T.M. VLSI and PCB placement optimization using hierarchical scanning area method // 42 Internationales Wissenschaftliches Kolloquium, Technische Universitat Ilmenau. Ilmenau, 1997. P. 594-599. 4.Bazylevych R.P., Rybak O.G. Top-down circuits partitioning algorithm for a few partitions // Visnyk of the Lviv Polytechnic State University, 1998. N° 349. P.181-185 (In Ukrainian). 5. Bazylevych R.P., Rybak O.G. Tractable algorithms of dichotomy partitioning with given initial conditions // Visnyk of the Lviv Polytechnic State University, 1998. N° 349. P.185-191 (In Ukrainian). 6. Bazylevych R.P., Telyuk T.M. et al. The investigation of partitioning parameters influence for placement element’s performance by the method of hierarchical decomposition // Visnyk of the Lviv Polytechnic State University,

1998. N° 351. P.136-141 (In Ukrainian). 7. Bazylevych R.P., Rybak O. G. FPGA packaging optimization by the optimal circuit reduction method // Visnyk of the Lviv Polytechnic State University. 1999. N° 385, P.6-9 (In Ukrainian). 8. BazylevychR.P., RybakO. G. FPGA initial packaging by the optimal circuit reduction method // V isnyk of the Lviv Polytechnic State University. 1999. N 385. P.10-12 (In Ukrainian). 9. Bazylevych R.P., Teliuk T.M., Podolsky I. V., Demianets T.I., Stupinsky R.N. The LSI placement problem investigation // Visnyk of the Lviv Polytechnic State University ,

1999. N° 370. P.86-90 (In Ukrainian). 10. BazylevychR.P., Teliuk T.M. Some possibilities of placement quality improvement by the scanning area method and stochastic algorithms // Visnyk of the Lviv Polytechnic State University , 1999, no. 386. P. 148-153, (In Ukrainian). 11. R.P. Bazylevych, T.M. Teliuk, Shcherbyna O. Stochastic scanning area method as an effective tool for placement element optimization // Visnyk of the Lviv Polytechnic State University, 1999. N° 386. P.176-182 (In Ukrainian). 12. Bazylevych R.P., Podolsky I. V. Hierarchical clustering of complicated circuits / / Visnyk of the Lviv Polytechnic State University, 2000. N» 392. P. 155-158 (In Ukrainian). 13. Bazylevych R.P., RachynskyM. Linear and ring element placement of elements by the hierarchical decomposition and scanning area methods // Contemporary computing in Ukraine. Symposium proceedings. Lviv, 2000. P. 7983, (In Ukrainian). 14. Bazylevych R.P., MelnykR.A., Rybak O.G. Circuit partitioning for FPGAs by the optimal circuit reduction method // VLSI Design. Vol. 11. N°. 3,2000, P.237-248. 15.Nan-Chi Chou, Lung-Tien Liu, Chung-Kuan Cheng, Wei-Jin Dai, and Rodney Lindelof. Circuit partitioning for huge logic emulation systems // Proc. 31st ACM/IEEE Design Automation Conference, 1994. P. 244-249. 16. Kuznar R., Brglez F., Kozminski K. Cost minimization of partitions into multiple devices // Proc. of IEEE / ACM 30th DAC, 1993. P. 315-320. 17. SteinbergL. The Backboard Wiring Problem: and Placement Algorithm, SIAM Review. 1996. Vol. 3, R°1. P. 37-50.

94

R&I, 2003, N 3

i Надоели баннеры? Вы всегда можете отключить рекламу.