Научни трудове на Съюза на учените в България - Пловдив Серия В. Техника и технологии, том XIII., Съюз на учените, сесия 5 - 6 ноември 2015 Scientific Works of the Union of Scientists in Bulgaria-Plovdiv, series C. Technics and Technologies, Vol. XIII., Union of Scientists, ISSN 1311-9419, Session 5 - 6 November 2015.
НОВИ МЕТОДИ ЗА АБСТРАКЦИЯ И ПРОТОТИПИЗАЦИЯ НА NOC-БАЗИРАНИ ЕТЕРНЕТ СМАРТ СУИЧОВЕ Иле Дим итриевски, Валентин С. Моллов Технически Университет - София, катедра „Компютърни системи"
NOVEL ABSTRACTION AND PROTOTYPING METHODS FOR NOC-BASED SMART ETHERNETSWITCHES Ile Dim itrievski, Valentin S. Mollov Department of Computer Systems, Technical University of Sofia, Bulgaria
Abstract: The problem for effective prototyping of the Networks on Chip (NoC)based devices has become an important issue since they have started to be implemented in highperformance smart switches single-chip devices.Now, the NoC-based switchesare required to fulfill the requirements for excellent performance aslowest possible time delay andoverall latency,an increased traffic speed throughthe network switch, and also an increasedbandwidth and throughput. In this paper the state-of-the-art methods for NoC prototyping are presented. Some platforms used for prototyping of these networks are discussed: with a hardcoded core and reconfigurable FPGA part and with a fully configurable FPGA architecture.An overview of the selected platformsis done with an introduction to Ethernet switch simulations methods usingns-2 network simulator.
1. Introduction
Current algorithms applied on network on chip (NoC), does not fulfill the requirements in the billion transistors era. In the billion transistors era such network on chip (NoC) that will imply combination of numerous different IP's will be discussed in this paper. A resource can be soft processor core, hard processor core, DSP core, and FPGA block, other dedicated block such as mixed signal block, or memory block RAM, ROM etc. Here we propose usage of NoC platform consisting of architecture and design methodology, which scales from a few dozens to several hundred or even thousands of resources [4]. According to Moore's Law,the transistor density of integrated circuits (IC) is doubled every 1.5 years[1]. During the last five decades this law was successfully adopted by semiconductor technology, number of transistors on single chip was exponentially increased. The proposal is given on basis of three assumptions given in [4]:
1. The Moore's Law is still valid, and will continue to hold for another 5 years.
2. Single chip will not be able to utilize the transistors on an entire chip. The single synchronous clock signal will be provided on small areas of the chip [2,3,4].
3. Applications that are modeled for single chip requesting large number of communication tasks. The difference between applications, are their characteristics and they can significantly vary from application to application. Characteristics like control or dataflow will dominate, and origins reused from the earlier products [5]. And this will make a heterogeneous implementation for different kind of resources for different tasks the most cost effective solutions.
Fig. 1 Basic NOC architecture Fig.2 4X4 NoC topology
3. Related works
NOCs used mainly shared bus for interconnection of the IP cores. There are three main shared buses used in NOCs design[8] and they are ARM AMBAbus, Wishbone and IBM core connect [9]. All of them suffer from the drawback of non-scalability. In the literature can be found different micro network proposals and one of them is Sonic's Silicon Backplane[10]. This is bus based architecture in which the IP cores are connected to the bus via specialized interfaces called agents: www.ocpip.org. In this case cores are communicating with an agent using Open core protocol (OCP). Agents communicate between using TDMA bus access schemes. Agents effectively decouple the IP cores from the communication network. Because the basic interconnection is still bus-based performance degradation trends that are common for busses.
X X X X ■ $ " 5 ^
<5oc5oc5oc5a5odoc5oc5o O OO OO Qt!3 OO ¿>C> OO OO O
Fig.3c Binary tree Fig.3d Butterfly fat tree (BFT)
Mips Technologies has introduced an on-chip switch integrating IP cores in NoC: www.mips.com. The switch needs to provide high performance link between MIPS processor and multiple third party peripherials. This proposal is central switch connecting with different peripherals, but only in point-to-point mode.
Mesh-based interconnect architecture have been proposed by Kumar[6] and Dally[7]. These architectures consist of m x n mesh of switches of switches interconnecting computational resources (IPs) placed along with the switches [8]. Each switch is thereby connected to four neighboring switches and one IP block. In this case, the number of switches is equal to the number of IPs. This topology is given on Fig.3a. Dally and Towles [7] proposed the use of a torus interconnect architecture. A variation of the torus architecture, which eliminates the use of long
wraparound wires, called a folded torus. Saastamoinen [11] describes the design of a reusable switch to be used in future SoCs. The interconnect architecture is however not specifically discussed. This topology is given on Fig.3b.Guerrier and Greiner [12] proposed the use of a fat tree based interconnect (SPIN) and addressed system level design issues.
Karim et al [13] proposed the Octagon network in the context of network processor design. It is a direct network. Similar to that in the fat tree topology, the point-to-point delay is also determined by the relative source/terminus locations, and communication between any two nodes (within an octagon subnetwork) requires at most two hops.P.P Pande et al [14,15] proposed butterfly fat tree interconnect architecture, modified form of fat tree, for a networked SoC as well as provided the associated design of required switches and addressing mechanisms - Fig.3d.All of the above mentioned works propose kind of interconnect architecture to solve the global wire delay problem.
4. Topologies
Network topology refers to the organization of the shared router nodes and channels in an on-chip network. The topology of a NoC can be compared to a roadmap. The channels (similar to roads) transport packets (similar to vehicles) from one router node (crossing) to another [7]. A good topology utilizes the features of the existing packaging technology to achieve required application bandwidth and latency. Choosing a network topology is the principal step in designing a network as the routing strategy and flow-control methods are governed heavily by the topology.Deciding on a topology also helps in designing of the router to be used in the NoC, as clarified in [6]. The ways in which the different nodes in a network are connected and communicate with each other are controlled by the network topology. Some of the topologies for NoC are Mesh, Torus, Binary Tree and Butterfly Fat Tree (BFT), which are discussed below.
A. Mesh.This architecture is the most common among all interconnection topologies where each router, apart from those at the edges, is linked to four adjoining routers and one computation resource (IP), by the way of communication channels. It allows incorporation of large number of IP cores in a regular-shape structure. Fig.3(a) shows a 4x4 mesh NoC with 16 IP blocks.
B. Torus.The torus architecture as shown in Fig. 3(b) is fundamentally similar as a mesh except that routers at theedges are linked to the routers at the opposite edge through folded channels. Every router has five ports, one linked to the computational resource and the others linked to the closest neighboring routers. The long fold-around connections may generate excessive delays.
C. Binary Tree.In the Binary Tree topology, the design is modeled in the form of a tree. Each node in the tree can be denoted by a set of coordinates (level, position) where level is the vertical level in the tree and position is the horizontal placing in left to right ordering. Here, as depicted in Fig.3(c), each router node is linked to 2 nodes in the subsequent level with all the resource nodes present at the bottommost vertical level.
D. Butterfly Fat Tree.In the Butterfly Fat Tree (BFT) topology, the design is modeled in the form of a tree with butterfly style links. Each node can be denoted similarly as in Binary Tree. The resource (IP) nodes are at the bottommost vertical level such that 4 resource nodes are linked to a router node, which is at a level higher than the resource nodes. Each router node is linked to either 4 router or resource nodes, as depicted in Fig.3(d).
5. Simulation and discussion
We used ns 2 networks simulator to perform latency test for different topologies and number of IP cores:www.isi.edu/nsnam/ns. We examine the topologies given in Fig.3a to Fig.3d. The setup for the simulation environment was adjusted for maximal latency for shortest path routing protocol, and simulation was made with UDP transmission protocol. According to the achieved results - Fig.4 and Fig.5, we can confirm that ns 2 simulator is ideal to build simulation model for NoC's. Main accent in the future work will be given in making changes and building simulation model which behavior will be close to the real NoC. Improvement of the algorithms for routing in NoC will be also topic of future research.
Fig. 4 Graphical representation for max latency for different topologies, 16 IP and 64 IP cores
Max latency (^s)
Load 4X4 Mesh 4X4 Torus Binary tree Butterfly Fat Tree
25% 803.609 802.105 811.428 409.738
50% 803.609 802.105 814.78 410.546
75% 803.609 802.105 831.356 412.14
100% 811.016 802.105 833.18 413.13
Max latency (^s)
Load 8X8 Mesh 8X8 Torus Binary tree Butterfly Fat Tree
25% 2392.547 2391.135 2401.728 1603.329
50% 2392.547 2391.135 2403.249 1604.348
75% 2392.547 2391.135 2408.624 1608.223
100% 2401.521 2391.135 2411.325 1609.11
Fig. 5a Maximal latency for 16 nodes Fig.5b Maximal latency for 64 IP nodes
References
1. Semiconductor Industry Association, International Technology Roadmap for Semiconductors, World 4. Semiconductor Council, Edition 1999.
2. A. Hemani et. al, Lowering power consumption in clock by using Globally Asynchronous Locally Synchronous Design style, Proc. of Design Automation Conference, 1999, USA.
3. D. Sylvester and K. Keutzer, "Getting to the Bottom of Deep Submicron", Proc. of the Int. Conference on Computer-Aided Design, 1998, pp. 203-211.
4. D. Sylvester and K. Keutzer, Getting to the Bottom of Deep Submicron II: A global wiring paradigm, Proc. of the 1999 Int. Symp. on Physical Design, 1999, pp.193-200.
5. C. Szyperski, Component Software: Beyond Object Oriented Software, Reading, MA, ACM/Addison Weseley, 1998.
6. S. Kumar, A. Jantsch, et al, "A network on chip architecture and design methodology", Proceedings of IEEE computer society annual symposium on VLSI, 2002.
7. Williams James Dally and Brian Towles, Principles and Practices of Interconnection Networks, Morgan Kaufmann Publishers, SanFrancisco, 2004.
8. C. Grecu, et al, A Scalable Communication-Centric SoC Interconnect Architecture, Proc. of ISQED 2004, San Jose, California, USA, pp.343-348, Mar., 2004.
9. www.arm.com.; www.silicore.net/wishbone.htm;www-3.ibm.com/chips/products/coreconnect/
10. D. Wingard, MicroNetwork-Based Integration for SoCs, Proc. DAC 2001, USA, 2001, pp.673-677.
11. I. Saastamoinen, et al, Interconnect IP Node for Future System-on-Chip Designs, Proc. of the First IEEE Int.Workshop on Electronic Design, Test and Applications, pp.116-120, 2002.
12. P. Guerrier, A. Greiner, A generic architecture for on-chip packet-switched interconnections, Proc. of Design, Automation and Test in Europe Conference and Exhibition 2000, pp. 250-256.
13. F. Karim, A. Nguyen, S. Dey,An interconnect architecture for networking systems on chips, IEEE Micro,vol.22 Issue 5, Sept.2002, pp.36-45.
14. P.P.Pande, et at, Design of a Switch for NoC Applications, Proc.ISCAS, pp.217-220, May, 2003.
15. P.P. Pande, et al, High-Throughput Switch-Based Interconnect for Future SoCs, Proc.3rd IEEE Int. Workshop on System-on-Chip for Real-Time Appl., Calgary, Canada, pp.304-310, June-July, 2003.