Научни трудове на Съюза на учените в България - Пловдив Серия В. Техника и технологии, том XIII., Съюз на учените, сесия 5 - 6 ноември 2015 Scientific Works of the Union of Scientists in Bulgaria-Plovdiv, series C. Technics and Technologies, Vol. XIII., Union of Scientists, ISSN 1311-9419, Session 5 - 6 November 2015.
МЕТОДИ ЗА ТЕСТВАНЕ НА ПРОИЗВОДИТЕЛНОСТТА НА NOC-БАЗИРАНИ ЕТЕРНЕТ СМАРТ СУИЧОВЕ
Иле Димитриевски, Валентин С. Моллов Технически Университет - София, катедра „Компютърни системи"
PERFORMANCE TESTING METHODS FOR NOC-BASED SMART
ETHERNET SWITCHES
Ile Dimitrievski, Valentin S. Mollov Department of Computer Systems, Technical University of Sofia, Bulgaria
Abstract:Nowadays, when Networks on Chip (NoC) based single-chip networking devices are designed so, to achieve their maximal performance methods appropriate methods for testing of these methods must to be developed.Performance of the NoC-based Ethernet smart switcheshas been rapidly improved andthey are required to fulfill some requirements likelowest possible time delay andoverall latency,an increased traffic speed throughthe network switch, and also an increasedbandwidth and throughput. The state-of-the-art methods for fabric testing of the performance onNoCbased smart Ethernet switchesarepresented. Performance of the differentalgorithms for switching in NoC based smart Ethernet switching will be presented anddiscussed.An overview of selected methodswill be performed and an introduction into simulating of these performance methods will be given.
Keywords: NOC, average latency, message throughput, energy l.Introduction
The NoC's design methodology is expected to be revolutionary changed during the next years.According to related reference papers [4,5,6], the NoC's platforms in future will consist of large set of embedded processors. On these NoC's numerous IP cores will be integrated performing various functions and working on different clock frequencies. Basic NoC structure is given on Fig. l.One of the main problems associated with the future NoC's design occurs from the non scalability of global wires and delay caused by these lines. Global wires that carry signals across the chip and their length, does not scale with the technology scale.For a relatively long bus line, the intrinsic and parasitic resistance and capacitance can be quite high.
2. Related works
The most frequently used on-chip interconnect architecture is the shared medium arbitrated bus, where all communication devices share the same transmission medium. The advantages of the
shared-bus architecture are simple topology, low area cost, and extensibility. In this paper the basic topologies of NoC's will be presented and they are given in Fig. 1.
b) Mesh
c) Binary tree
Fig. IBasic NoC topologies
Torus
d) Butterfly Fat tree
3. Performance metrics
To compare and contrast different NoC architectures, a standard set of performance metrics can be applied [22], [27],[1]. For example, the NoC interconnectarchitecture exhibits high throughput, low latency, energy efficiency, and low area overhead. In today's power constrained environments, it is critical to be able to identify the most energy efficient architectures and to be able to quantify the energy-performance trade-offs [1]. Generally, the additional area overhead due to the infrastructure IPs should be reasonably small. We now describe these metrics in more detail.
3.1 Message Throughput
The performance of a digital communication network is characterized by its bandwidth in and the measurement unit is bits/sec. However, in this case we are more concerned here on the rate that the message traffic can be sent across the network and, so, throughput is a more appropriate metric. Throughput can be defined in a different ways depending on the specifics of the implementation, i.e. topologies of the NoC. In general, for message passing systems, definition about message throughput, TP, it can be given:
TP (Total messages complited) x (Message length) (1)
(Number of IP blocks) x (Total time)
where Total messages completed refers to the number of whole messages that successfully arrive at their destination IPs, Message length is measured in flits, Number of IP blocks is the number of functional IP blocks involved in the communication, and Total time is the time (measured in clock cycles) that elapses between the occurrence of the first message generation and the last message reception. Thus, the message throughput is measured as the fraction of the maximum load that the network is capable of physically handling. An overall throughput of TP=1 corresponds to all end nodes receiving one flit every cycle. Accordingly, throughput is measured in flits/cycle/IP. Throughput signifies the maximum value of the accepted traffic and it is related to the peak data rate sustainable by the system[1].
3.2 Transport Latency
Latency is defined as the time (in clock cycles) that elapses between the occurrence of a message header injection into the network at the source node and the occurrence of a tail flit reception at the destination node [7]. We refer to this simply as latency in the remainder of this paper. In order to reach the destination node from some starting source node, flits must travel through a path
consisting of a set of switches and interconnect, called stages. Depending on the source/destination pair and the routing algorithm, each message may have a different latency. There is also some overhead in the source and destination that also contributes to the overall latency. Therefore, for a given message i, the latency L,- is:
L = sender overhead + transport latency + receiver overhead. (2)
We use the average latency as a performance metric in our evaluation methodology. The average latency is crucial for evaluating of the performance of NoC.P will be the total number of messages reaching their destination IPs andL, is the latency of each message i, where i ranges from 1 to P. The average latency, Lavg, is then calculated according to the following:
L =YpL-. (3)
avg p
3.3 Energy
When flits travel on the interconnection network, both the interswitch wires and the logic gates in the switches toggle and this will result in energy dissipation and this definition was givenin reference [1]. In this paper, we are concerned with the dynamic energy dissipation caused by the communication process in the network. The flits from the source nodes need to traverse multiple hops consisting of switches and wires to reach destinations. We are determine the energy dissipated by the flits in each interconnect and switch hop. The energy per flit per hop is given by:
E = E + E
hop switch interconnect"> (4)
where Eswitch and Einterconnectdepend on the total capacitances and signal activity of the switch and each section of interconnect wire, respectively. They are determined as follows:
Eswitch = a switchCswitchV , (5)
Einterconnect a interconnectCinterconnect V (6)
«switch and «interconnect and Cswitch and Cinterconnect are the signal activities and the total capacitances of the switches and wire segments, respectively. V is the value of power supply. The energy dissipated in transporting a packet consisting of n flits over h hops can be calculated as:
Yp E YP (n.Yh e, .)
E^> IE packet- = L-"=l\ hop, Jl (7)
P P
The parameters switch and interconnect are those that capture the fact that the signal activities in the switches and the interconnect segments will be data-dependent, e.g., there may be long sequences of 1s or 0s that will not cause any transitions. Any of the different low-power coding techniquesaimed toreduce the number of transitions can be applied to any of the topologies described here. For the sake of simplicity and without loss of generality, we do not consider any specialized coding techniques in our analysis.
4. Simulation results and discussion
We used ns2simulator for the simulations about the throughput parameter [8].The applied constraints during simulation are shown in Table 1 and the correspondent results - in Fig.2 to Fig.4. Wormhole switching technique and shortest path algorithm was implemented on the different NoC topologies. The keyfactor evaluated in this case study will be the throughput for different topologies and different number of IP cores in topologies.
5. Conclusions and future work
From the simulations made with ns 2 network simulator show one of the key performance like throughput is, of the differenttopologies of NoC's. Deep empiric investigation wasdone
NoC Model Parameters Constraints applied in
Parameters NS2
Number of 16
Resources IP cores
Connections Resource-Router, Router-Router
Transmission Proto User Datagram Protocol(UDP)
Routing Scheme Static
Routing Protocol Shortest Path
Queve mechanism Stochastic Fairness Queuing (SFQ)
Link Queue 8 packets
Bisection Route r-to- route r-300Mb
Bandwidth (Max.) Resource-to-router - 200Mb
Traffic Generation Constant Bit Rate (CBR)
Traffic Rate 180 Mb
Packet Size 16 bytes
Transmission Proto
Routing Scheme
Routing Protocol
Queve mechanism
Link Queue
Bisection Bandwidth (Max.)
Traffic Generation
Traffic Rate
Packet Size
NS2
16
Resource-Router, Router-Router
User Datagram Protocol(UDP)
Static
Shortest Path
Stochastic Fairness Queuing (SFQ)
8 packets
Route r-to- route r-300Mb Resource-to-router - 200Mb
Constant Bit Rate (CBR)
180 Mb
16 bytes
Fig.2Average throughput for different topologies and number of IP cores
Table 1. Constraints applied in ns2 to simulate the NoC's
Averag ;e throughput (Mbps)
Load 4X4 Mesh 4X4 Torus Binary tree Butterfly Fat Tree
25% 35.945 35.862 32.753 32.659
50% 65.12 69.781 58.842 59.783
75% 100.869 103.853 59.894 68.548
100% 115.934 130.964 63.792 70.158
Average throughput (Mb ps)
Load 8X8 Mesh 8X8 Torus Binary tree Butterfly Fat Tree
25% 8.659 8.568 8.058 8.026
50% 16.247 17.237 14.752 14.892
75% 25.178 25.632 14.293 17.451
100% 28.589 31.641 15.491 19.058
Fig.3 Average throughput with 16 IP cores Fig.4 Average throughput with 64 IP cores
respectively to the performance of NOC's. For future work we plan to work in the improvement of the performance empiric equations. Main direction will be reducing of the consumed energy for transfer of single flit and improvement of the existing routing algorithms to achieve minimal latency and maximal throughput. Another important direction of research is area that will be occupied on silicon slice by the Ethernet smart switch.
6. References
1. P.Pande, C.Grecu, et al, Performance Evaluation and Design Trade-Offs for Network-on-Chip Interconnect Architectures" IEEE Trans. on Computers, v. 54, no. 8, Aug.2005;
2. C.Grecu, A.Ivanov, R.Saleh, P.Pande, Testing Network-on-Chip Communication Fabrics, IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 2007.
3. T.Reddy, J.Singh, K.Mahapatra, Performance assessment of different NoC topologies, 2nd International Conference on Devices, Circuits and Systems (ICDCS), pp.1-5, 2014.
4. L. Benini,and G. DeMicheli, Networks on Chips: A New SoCParadigm, Computer, vol. 35, no. 1, pp. 70-78, Jan. 2002.
5. P. Magarshack and P.G. Paulin, System-on-Chip beyond theNanometer Wall, Proc. Design Automation Conf. (DAC), pp. 419-424, June 2003.
6. M. Horowitz and B. Dally, How Scaling Will Change Processor Architecture, Proc. Int. Solid State Circuits Conf. (ISSCC), pp. 132-133, Feb. 2004.
7. P. Pande, C. Grecu, et al, Design of a Switch for Network on Chip Applications, Proc. Int. Symp. Circuits and Systems (ISCAS), vol. 5, pp. 217-220, May 2003.
8. ns 2 website [Onlinel. Available: http://www.isi.edu/nsnam/ns