6
QUANTITATIVE COMPARISON OF SWITCHING STRATEGIES FOR NETWORKS ON CHIP Anthony Leroy Julien Picalausa, Dragomir Milojevic BEAMS department Service des Systemes Logiques et Numeriques Universite Libre de Bruxelles Universite Libre de Bruxelles CP 165/56, avenue F. D. Roosevelt 50 CP 165/57, avenue F. D. Roosevelt 50 1050 Bruxelles 1050 Bruxelles email: [email protected] email: [email protected] ABSTRACT tween different nodes. While the integration scaling offers To ensue lthe gate delay reduction, this is not true for wires. Tradi- To. enur lo oehosmtowiemitiig tonal on-chip communication architectures based on buses flexibility and performance, future Systems-on-Chip (SoC) will no longer be adequate for future Systems-on-Chip be- will integrate many processor nodes and memory units. To interconnect teIcause the bandwidth they can provide is too low for certain interconnect these IP nodes, Networks-on-Chip (NoC) have apiain.Frhroeteeeg osmto fbse been proposed as an efficient and scalable alternative to isatoohg duetherlarge iner ta hasu to bedien shared buses. One major problem consists in being able to compare choices and strategies in NoC design. To tackle this The research community has therefore proposed the Net- problem, we propose a complete highly configurable frame- works on Chip (NoC) paradigm as high performance, highly work called Polymorpher which enables a quantitative com- scalable and modular alternative to buses [1] [2] [3]. How- parison of the performance and energy consumption of dif- ever, while a considerable effort has been deployed in the ferent NoC communication component architectures. Our NoC development, very little research has reported on en- models are based on a set of basic VHDL communication ergy and power efficiency of NoCs. components that can be reused for different designs. This common test-bed allows us to fairly and accurately com- A wide vetyof NoC architectures have been proposed pare different types of communication components in terms so far (see Section 2). NoC design consists of a large num- oarediffeergy t con esumptiondea andmn area. copnetsic therm ber of decisions with many alternatives and it is difficult to framework enables easy instantiation and exploration of dif- compare them on the same test-bed. The main actual prob- ferent types of routers. We have chosen to explore differ- lem is to be able to compare different solutions proposed ent switching strategies and parameters as an example of in the litterature which often rely on different assumptions, the possibilities offered by our tool. Our study compares technological libraries and test-benches. quantitatively different switching techniques widely used in To tackle this problem, we propose a highly configurable NoCs (Store and Forward, Virtual Cut Through, Wormhole) framework for NoCs, called Polymorpher, which enables a in terms of power consumption, area overhead and delay quantitative comparison of the area, critical path and energy with a post lay-out gate-level simulation. consumption of different router architectures. This paper presents the details of our framework and il- 1. INTRODUCTION lustrates its exploration possibilities with the choice of the network's switching technique. Submicron technologies offer a considerable amount of re- sources, billion-transistors chips are expected within this The remainder of this paper is structured as follows. decade. Such technologies enable integration of large Sys- Section 2 describes related work and focuses more specifi- tems on Chips (SoCs) and Multiple Processors SoCs (MP- cally on energy requirements and power dissipation. Section SoC) containing a wide variety of computing nodes and 3 presents a brief overview of our framework, focusing par- memories. ticularly on the different switching strategies that can be im- The integration of such complex systems involves many plemented. In Section 4 we report the experimental method- problems related to the design, processing and verification. ology and results obtained. Finally the paper concludes with Another major problem is related to the communication be- Section 5. 1-4244-0606-4/07/$20.OO ©C2007 IEEE. 57

[IEEE 2007 3rd Southern Conference on Programmable Logic - Mar del Plata, Argentina (2007.02.28-2007.02.26)] 2007 3rd Southern Conference on Programmable Logic - Quantitative Comparison

Embed Size (px)

Citation preview

Page 1: [IEEE 2007 3rd Southern Conference on Programmable Logic - Mar del Plata, Argentina (2007.02.28-2007.02.26)] 2007 3rd Southern Conference on Programmable Logic - Quantitative Comparison

QUANTITATIVE COMPARISON OF SWITCHING STRATEGIES FORNETWORKS ON CHIP

Anthony Leroy Julien Picalausa, Dragomir Milojevic

BEAMS department Service des Systemes Logiques et NumeriquesUniversite Libre de Bruxelles Universite Libre de Bruxelles

CP 165/56, avenue F. D. Roosevelt 50 CP 165/57, avenue F. D. Roosevelt 501050 Bruxelles 1050 Bruxelles

email: [email protected] email: [email protected]

ABSTRACT tween different nodes. While the integration scaling offersTo ensue lthe gate delay reduction, this is not true for wires. Tradi-

To.enur lo oehosmtowiemitiig tonal on-chip communication architectures based on busesflexibility and performance, future Systems-on-Chip (SoC) will no longer be adequate for future Systems-on-Chip be-will integrate many processor nodes and memory units. To

interconnect teIcause the bandwidth they can provide is too low for certaininterconnect these IP nodes, Networks-on-Chip (NoC) have apiain.Frhroeteeeg osmto fbsebeen proposed as an efficient and scalable alternative to isatoohg duetherlarge iner tahasu to bedienshared buses. One major problem consists in being able tocompare choices and strategies in NoC design. To tackle this The research community has therefore proposed the Net-problem, we propose a complete highly configurable frame- works on Chip (NoC) paradigm as high performance, highlywork called Polymorpher which enables a quantitative com- scalable and modular alternative to buses [1] [2] [3]. How-parison of the performance and energy consumption of dif- ever, while a considerable effort has been deployed in theferent NoC communication component architectures. Our NoC development, very little research has reported on en-models are based on a set of basic VHDL communication ergy and power efficiency of NoCs.components that can be reused for different designs. Thiscommon test-bed allows us to fairly and accurately com- A wide vetyof NoC architectures have been proposedpare different types of communication components in terms so far (see Section 2). NoC design consists of a large num-oarediffeergy t

con esumptiondeaandmn area.copnetsic therm ber of decisions with many alternatives and it is difficult to

framework enables easy instantiation and exploration of dif- compare them on the same test-bed. The main actual prob-ferent types of routers. We have chosen to explore differ- lem is to be able to compare different solutions proposedent switching strategies and parameters as an example of in the litterature which often rely on different assumptions,the possibilities offered by our tool. Our study compares technological libraries and test-benches.quantitatively different switching techniques widely used in To tackle this problem, we propose a highly configurableNoCs (Store and Forward, Virtual Cut Through, Wormhole) framework for NoCs, called Polymorpher, which enables ain terms of power consumption, area overhead and delay quantitative comparison of the area, critical path and energywith a post lay-out gate-level simulation. consumption of different router architectures.

This paper presents the details of our framework and il-1. INTRODUCTION lustrates its exploration possibilities with the choice of the

network's switching technique.Submicron technologies offer a considerable amount of re-sources, billion-transistors chips are expected within this The remainder of this paper is structured as follows.decade. Such technologies enable integration of large Sys- Section 2 describes related work and focuses more specifi-tems on Chips (SoCs) and Multiple Processors SoCs (MP- cally on energy requirements and power dissipation. SectionSoC) containing a wide variety of computing nodes and 3 presents a brief overview of our framework, focusing par-memories. ticularly on the different switching strategies that can be im-

The integration of such complex systems involves many plemented. In Section 4 we report the experimental method-problems related to the design, processing and verification. ology and results obtained. Finally the paper concludes withAnother major problem is related to the communication be- Section 5.

1-4244-0606-4/07/$20.OO ©C2007 IEEE. 57

Page 2: [IEEE 2007 3rd Southern Conference on Programmable Logic - Mar del Plata, Argentina (2007.02.28-2007.02.26)] 2007 3rd Southern Conference on Programmable Logic - Quantitative Comparison

2. RELATED WORKI tput queue Oulpa|t queues

Related work mainly concerns evaluation of NoC powerconsumption, a topic for which the lack of standardization L_in experimental set-up lead to very difficult comparison be-tween different studies. Swi&h

In the past couple of years, NoC's have been investigated -as an alternative solution for communication infrastructurein homogeneous and heterogeneous SoCs having ten andmore nodes (processors, IP blocks, memories, etc.). Whencompared to buses, the NoCs have the advantage of highbandwidth capabilities; they are flexible, scalable and evenenergy efficient, as it has been recently reported in [4, 5]. Fig. 1. Structure of a generic router

Many different approaches to the NoC communicationparadigm have been proposed in literature in past couple . .of years. In [6] a fully synchronous, pipelined NoC called

Idue to the NoC in maxlmum traffic condhons.

xPipes has been proposed, featuring the wormhole switch- In conclusion, the results presente here often rely oning with static routing tables. A similar NoC, called Pro- very different assumptions, technologies and experimentalteo wintroduci t ou s slk has bee Prop setup. Comparison between the options is thus very difficultteo, introducing the asynchronous links has been proposed

omk.Asadr etbd ol hsb eybnfcain [7]. The Quality of Service (QoS) for the NoC has been to make. A standard test-bed would thus be very beneficialintroduced in Nostrum [8] using Time Division Multiple Ac- in this context.cess (TDMA). Finally, Philips Research Laboratories haveproposed a contention-free, pipelined NoC, called AEthereal 3. DESIGN OF THE POLYMORPHER NOCimplementing the best-effort (BE) and guaranteed through- FRAMEWORKput (GT) QoS [9].

Today, there are also a few commercially available NoC We propose a highly configurable framework called Poly-solutions, providing more or less completely automated pro- morpher as a common test-bed for NoC comparison.cess of design, simulation and implementation of the NoC This section first describes the generic NoC architectureparadigm in silicon. One can mention Arteris and Silistix. on which our framework is based. Then, a detailed descrip-While Arteris NoC proposes GALS approach [4], the Silis- tion of the implementation of the framework is presented.tix NoC is completely asynchronous [10].

A priori we can doubt that NoCs are power efficient: the 3.1. Generic Network on Chip architectureraw data has to be first encapsulated in packets in the net-work interface unit on the master side. These packets will In most Networks on Chip, IP nodes are connected to theirhave to travel through a certain number of routers, where own router through a network interface. Routers are inter-they will be eventually buffered before routed to the next connected to each other by point-to-point links to form arouter. Finally, the raw data has to be extracted from packets given network topology (e.g. mesh, torus, ...). Their role isin the slave network interface, before reaching the target. to forward data from the source to the destination IP.

A specific study about the energy requirements of indi- The structure of a generic router is presented on Fig.l.vidual NoC elements has been proposed in [5]. The energy A switch connects the router intput ports to the output ports.requirements of routers and wires have been evaluated for The configuration of the switch is set up by the routing mod-130 nm technology at Vdd = 1V. The results show that the ule. The routing decision can be performed either within theenergy required per bit transfered is 0.98pJ/bit for a packet router or by an external module in which case, the switchswitched router. For a circuit switched router this energy has configuration is generally stored in an internal routing table.been evaluated to be 0.37pJ/bit. Optional FIFO queues can be used for input or output packet

Similar results of the energy requirements for a circuit buffering. Finally, link controllers (LC) manage transactionsswitched router has been reported in [11]. For an 8 port on the links interconnecting adjacent router ports.router, built in 100nm technology, the energy of a bit transfer NoCs are generally designed following the OSI modelhas been evaluated to 0.47pJ/bit. and in particular on three main layers: the physical, switch-

Very recently the power efficiency of a complete NoC ing and routing layers. The physical layer refers to the linkhas been discussed in [12]. Their 25mm2 chip, built in level protocols for transferring messages and managing the180nm technology features a simple NoC running at 1.6GHz physical channel between adjacent routers. The switchingand offering 11.2GB/s of aggregate bandwidth. The power layer utilizes these physical layer protocols to implementbudget of the complete chip is 160mW, in which 51mW is mechanisms for forwarding messages through the network.

58

Page 3: [IEEE 2007 3rd Southern Conference on Programmable Logic - Mar del Plata, Argentina (2007.02.28-2007.02.26)] 2007 3rd Southern Conference on Programmable Logic - Quantitative Comparison

The routing layer makes routing decisions to determine can- iCdidate output port at intermediate routers and establish pathsthrough the network. _

3.2. Polymorpher NoC framework ,,,_1_______The Polymorpher NoC framework is a very flexible high-level NoC component fabric. It allows the designer to de- ooscribe in a very modular way components such as routers 00 01 10 11and network interfaces. (a) Bit complement spatial pattern (b) For joinlpipeline spatial patter

The complete framework is fully written in VHDL, thusallowing the designer to get accurate post-synthesis results. Fig. 2. Spatial pattern

The framework offers a set of generic basic communi-cation modules and some specific modules. Generic ba-saticnmodules are commoneo mostc implementations.r ybTo our knowledge, no complete exploration of switchingsincludelin control oss FiFOeue s. Spe technique for Network-on-Chips has been published so far.include link controllers, crossbars, FIFO queues. Specific

W aecoe oepoetepromneadeegblocks~~~.decrb hihlee fucinlte liketheswithin We have chosen to explore the performance and energytechniqeorithegroutin strategyci consumption of the three basic switching techniques: Store

and Forward (SAF) Virtual Cut Through (VCT) and worm-For NoC parameters exploration, communication com-holegWH))

ponents can be easily adapted for different options such as hole (WH).topolgy, outinstrtegyand sitchng tchniqe byjustThis section first describes the different switching tech-

tpoluggn* thegmodiied compo n swint tehframewor . niques. The router model implementation is then discussed.rlugighemnouthefraimplem Finally, our experimental set-up and results for the differentThis paper currently focuses only onrouterswitchin techniques are presented.tion. Network interfaces will be addressed in future work. a

One important feature of our framework is that the routerdesign is very modular. This allows the designer to study 4.1. Switching technique decisionindependently different parameters and to ensure that differ- Switching techniques can be classified in two major groups:ences in terms of performance or power consumption comes packet switching and virtual circuit-switching.from one specific decision.

The traditional packet-switching technique consists inThe framework also offers fully configurable test- .

benches with various build-in traffic generators and traffic splitting messages that have to be sent over the networkinto small inde endently routed ileces of information calledchecking units that ensure that no data loss happened. ypackets. Each packet consists of a header containing theVarious traffic generators have been implemented and cnrlifrainnee yteruigfnto n

can be plug in the NoC framework. Temporal patterns, size cont ain ing theece ta.of the packets can be set and also spatial patterns.

Three different traffic patterns are considered in our test- Packet-switching allows to share communication linksbenches based on a effort of standardization proposed by among packets corresponding to different messages result-

[13]. Those are spatial patterns defining the connections ing in a better bandwidth utilization.As no full path pre-establishment overhead is required,between routers: uniform where destination is chosen ran- packet-switching techniques are thus well suited for in-

domly, bit complement where dsinato isdchoen byin frequent short messages. Packet-switched macroscopicvertingebitsrofptheesoure s (ig.e2ea. nd forejin networks generally exploit the Store and Forward (SAF)

in the network and then brought togetherentne pipelies switching technique: a packet can be forwarded to its cor-

intthe noetwr an thbr t h hresponding router output port only when it has completelyarrived at the input port (see fig. 3 (a) ).

The Virtual Cut Through (VCT) switching technique has4. EXPERIMENTS AND RESULTS been introduced to reduce the latency of SAF (see fig. 3

(b) ). In VCT, the switching granularity is smaller than aThis section presents our experimental results concerning packet. As soon as the output port needed by the packet isthe switching technique exploration, free, elements of the packet called flits are sent immediately

The choice of the switching technique is crucial in NoC even if the packet is not fully arrived. If the output port of thedesign because it will mainly impact the network latency. It packet is busy, the packet is stored in a buffer. Buffer spacewill also have a considerable influence on energy consump- is the same as for SAF and can contain several packets. Thattion and overall network performance. technique can considerably improve delay in the absence of

59

Page 4: [IEEE 2007 3rd Southern Conference on Programmable Logic - Mar del Plata, Argentina (2007.02.28-2007.02.26)] 2007 3rd Southern Conference on Programmable Logic - Quantitative Comparison

routerheader data 1 ikcnrle - 2 IO . 3 otn

R2 1nhibitBE

R3__

time <phits 5. Link controller flits4.Crossbar &

(at)Ste aiid Forward arbitration

rou,er Fig. 4. Block diagram of the configurable router used forRI , implementation and experiments.R2 ~ir

R3 ZZa Lc ferent circuits. Time is discretized in periods of fixed du-______________________ ' ration called time-slots. During a time-slot, the available

Thme bandwidth is exclusively dedicated to a given circuit. The(bh) Virtal Cut Through INWomhole number of time-slots that can be allocated corresponds to

the maximum number of circuits that can be allocated at theFig. 3. Switching techniques same time on the same link.

4.2. Router modelscontention.

The Wormhole (WH) switching technique is compara- For the sake of preserving as many modules as possible intoble to VCT but it exploits smaller buffer space than what the different router implementations for the various switch-is required for a whole packet. When the packet needs to ing techniques, packets are also split into flits in the Storeaccess a router output port which is blocked i.e. used by and Forward strategy. In that particular case, all the flits areanother packet, the packet remains distributed over all pre- sent together only when they all arrived at the input buffer.vious router buffers along the path. Buffer space is typically The block diagram of a router is presented on fig. 4.dimensioned to the size of a few flits resulting in faster and Each router input is connected to a link controller. Thissmaller routers. unit performs the synchronization of flits with the internal

Wormhole networks are therefore generally well adapted clock.in the context of Networks-on-Chip for which chip area is Flits are then sent to an input FIFO where they arelimited. That technique can considerably improve delay in buffered. The size of that FIFO will depend on the switch-the absence of contention. As the other packet-switching ing technique and can influence the energy consumption andtechniques, wormhole is generally not well adapted for long area overhead of the router. Input buffering is shown onmessages which are commonly encountered in multimedia the figure but different implementation are possible (outputapplications. The Virtual-Circuit Switched technique better buffering or virtual output buffering).handles this kind of traffic. The FIFO is then followed by the routing component

In circuit switching, an application establishes a connec- which analyzes the header of incoming packets and performtion from source to destination and uses it exclusively. the switching decision.

The connection is created by a routing probe injected in The switching component, usually a crossbar, imple-the control plane of the network prior to the data transmis- ments this switching decision. The module also checks thatsion. the conditions to send flits to the next router are fulfilled.

The circuit switching option is efficient when data is sent For packet switching, for instance, it checks that the wholeas one long message, i.e. when the data transmission time is packet has been received and that the next router has enoughmuch longer than the connection set-up time. A drawback free space to accept a new packet. When both parts allow theof the standard circuit switching is that only one application packet to be sent, flits are sent one by one through the cross-can use the reserved links for the needed duration. With the bar.introduction of multiplexing techniques this problem can be The packets finally exit the router through an output linkalleviated, controller.

The most common way to implement virtual circuit - If the router also includes Guaranteed Throughput (GT)switched network is to use Time-Division Multiplexing to facilities, a small bypass circuit is redirecting the GT flitsshare the bandwidth allocated to the circuits. In this case, outside of the packet switching Best Effort (BE) part, imme-the network resources are shared temporally among the dif- diately after the link controller. The GT flits are directly sent

60

Page 5: [IEEE 2007 3rd Southern Conference on Programmable Logic - Mar del Plata, Argentina (2007.02.28-2007.02.26)] 2007 3rd Southern Conference on Programmable Logic - Quantitative Comparison

to the corresponding output of the router, without any addi- FIF Flowcontrol

tional latency. The switching component integrates an ar- FF0 lbitration module which can possibly perform arbitration be- LC DEFontween potential best effort and guaranteed throughput traffic. BLT C

FIF =.: FlowThis generic router architecture allows the designer to contro Time

change at will any component by an equivalent one and thus CrosFI ..F Flow Crs

very easily test different design alternatives. c b

INPUTFIO lo

4.3. Experimental set-up and results G L M cnr

FIFO Flow7 BLOCK

Our experimental setup is based on a 4x4 mesh Network-on-Chip plateform. Routers have been successively imple-mented to perform store and forward, virtual cut through andwormhole switching techniques.

A two-port VCT router is presented on fig. 5. It is In all the designs, flits are sent over the inter-router linksmainly based on the components that have been described in small data units called phits that have the size of the routerearlier: input and output link controllers (LC), input FIFO port width. In our case, flits are composed of three 32 bit-buffers and a crossbar. The header decoding unit extracts phits.control information from the header. Both SAF and VCT routers have one input FIFO queue

per port that can contain up to 4 packets (40 flits). TheHEADER Header

- .decuonditng wormhole router has three input FIFO queues per port to.

FUILL FIFO Flow LLOW =OUTPUT support three virtual channels. Each input queue can con-lNP_> _ FIFt__contro _ -Et= tain up to S flits.

[BLO L | | CK VWe considered a uniform random distribution of theFULL ~~ packets size with a maximum packet size of 10 flits. The

FIFO boff -b Flow CROSS OTU aktijcinrt="PContro B LC packet injection rate is based on a uniform random distribu-

NEV PI r tion. The spatial traffic pattern is uniform.Results are presented for the same 5-ports router of the

NoC for different switching techniques.Fig. 5. Structure of the VCT router Our comparison is in no way restricted to only this par-

ticular case study and setup, but it gives a concrete setting toThe SAF router (fig. 6) exploits the same structure as the produce absolute values on power and area.

VCT router. The main difference is the end pktfinder mod- All delay, energy consumption and area estimations haveule. This component detects the end of the current packet been performed after synthesis with Synopsys Design Com-for variable packet size networks. piler with the 130nm UMC standard cells technology library

in average conditions (1 .2V, 25C).Table 1 shows area results for the different techniques.

To header decodingunit The wormhole router clearly appears as the smallest designdue to the fact that its FIFO input queues are smaller.

Although it has the same input queue size, the SAF~~~FIFO buffer Fltow HALLOW router appears to have a larger area overhead than the VCT

-FIFO_RDgOUTPUT ~~~~~~~router due to the fact that the SAF router integrate a quitecomplex packet-end detector to accept variable length pack-

r ~~~~ets.FIFO F I ow ALLOWroAreas are in the same order of magnitude than the results

obtained for AEthereal routers (0.29mm2) [14]. The differ-Fig. 6. Structure of the SAF router ence mainly comes from the fact that they also implement

GT components and they are considering 8 flits per FIFO.The WH router is also based on a similar structure than They are also using an advanced 130 nm technology library

the VCT router. As can be seen on fig. 6, the main dif- (Philips CMOSk12).ferences are the virtual channel multiplexers/demultiplexers The energy consumption is obtained with Power Coin-and a modified cross-bar component to cope with different piler by performing a switching activity annotation of thevirtual channels reaching the same output. design during a gate-level netlist simulation performed with

61

Page 6: [IEEE 2007 3rd Southern Conference on Programmable Logic - Mar del Plata, Argentina (2007.02.28-2007.02.26)] 2007 3rd Southern Conference on Programmable Logic - Quantitative Comparison

Switching technique 0 Area (mm2) T Power cons. (mW) [3] L. Benini and G. D. Micheli, "Networks on chips: A new soc

Store and Forward 0.929 11.6 paradigm," Computer, vol. 35, no. 1, pp. 70-78, 2002.

Virtual Cut Through 0.732 6.82 [4] ARTERIS, "A comparison of network-on-chip and buses," Internet address:

0Wormhole 0.593 3.23 http://www.arteris.com/noc-whitepaper.pdf, 2005.

Table 1. Comparison of the different switching techniques [5] P. T. Wolkotte, G. J. M. Smit, and J. E. Becker, "Energy-(area and power consumption) efficient noc for best-effort communication," in Proceedings

of the 15th International Conference on Field ProgrammableLogic and Applications 2005 (FPL 2005), Tampere, Finland,

Mentor Graphics Modelsim. T. Rissa, S. Wilton, and P. Leong, Eds. Piscataway, NJ,USA: IEEE Circuits and Systems Society, August 2005, pp.Power consumption results are presented on table 1 for 197-202.

the 5-ports router (1 incoming flit per port per cycle). Thewormhole router clearly has the smallest power consump- [6] D. Bertozzi and L. Benini, "Xpipes: A Network-on-Chip

tion d tteataF sa saeArchitecture for Gigascale Systems-on-Chip" IEEE Circuitstion due to the fact that FIFOs are smaller than in the twog yother routers. This is generally the case when no deadlock and Systems Magazine, vol. 4, 2004.

occurs in the network. Impact of contention has not yet been [7] D. Sigenza-Tortosa, T. Ahonen, and J. Nurmi, "Issues in thestudied and is short term future work. development of a practical noc: the proteo concept," Integr

The SAF router consumes more power than the VCT VLSI J., vol. 38, no. 1, pp. 95-105, 2004.

router mainly due to the fact that the whole FIFO queue is [8] M. Millberg, E. Nilsson, R. Thid, S. Kumar, and A. Jantsch,activated for each packet, while in VCT only part of it is "The nostrum backbone - a communication protocol stack foractivated (only one flit when no contention occurs). networks on chip," in VLSID '04: Proceedings of the 17th

International Conference on VLSI Design. Washington, DC,USA: IEEE Computer Society, 2004, p. 693.

5. CONCLUSIONS [9] K. Goossens, J. Dielissen, and A. Radulescu, "The IEtherealnetwork on chip: Concepts, architectures, and implementa-

The framework that we have presented allows designers to ti," IE Deign andepstacomutes, vol.2,no.5tions," IEEE Design and Test of Computers, vol. 22, no. 5,perform efficient comparisons between NoC architectures pp. 21-31, Sept-Oct 2005.and to easily test different options. [10] J. J. Bainbridge, Asynchronous System on Chip Interconnect.

The potential of our tool has been illustrated with an ex- Secaucus, NJ, USA: Springer-Verlag New York, Inc., 2002.ploration of part of the switching technique design space. [11] L.-S. P. Hangsheng Wang and S. Malik, "A technology-awareWormhole technique clearly appeared to be the most effi-

and energy-oriented topology exploration for on-chip net-cient strategy in terms of power consumption and area over- works," in Design, Automation and Test in Europe, Marchhead in a deadlock-free case study. 2005, pp. 1238-1243.

Our framework would be particularly useful for high[12] K. Lee, S.-J. Lee, and H.-J. Yoo, "Low-power network-on-level modeling of NoC power dissipation estimation for var-

chip for high-performance soc design," Very Large Scale Inte-ious implementation scenarios where different architectures gration (VLSI) Systems, IEEE Transactions on, vol. 14, no. 2,of the MPSoC system and application mapping are foreseen. pp. 148-160, February 2006.

Future work mainly consists in building network inter-p.14-6,Fbur20.Facsthre wour basicl com nsiatsion compoignentsrW ter-a [13] A. Jantsch, "Standards for NoC: What can we gain?" Fu-faces wihorbsccm i. Wture Interconnects and Networks on Chip Workshop - DATE

plan to couple our framework with an explorative algorithm 2006.to create innovative NoC instances.

[14] E. Rijpkema, K. G. W. Goossens, A. Radulescu, J. Dielissen,J. van Meerbergen, P. Wielage, and E. Waterlander, "Trade

6. ADDITIONAL AUTHORS offs in the design of a router with both guaranteed and best-effort services for networks on chip," in DATE '03: Proceed-

Frederic Robert (Universite Libre de Bruxelles) and ings ofthe conference on Design, Automation and Test in Eu-Diederik Verkest (IMEC/VUB/KUL) rope. Washington, DC, USA: IEEE Computer Society, 2003,

p. 10350.

7. REFERENCES

[1] B. T. W. Dally, "Route packets, not wires: Interconnect woesthrough communication-based design," in Proc. of the 38thDesign Automation Conference, 2001.

[2] A. Jantsch and H. Tenhunen, Networks on Chip. KluwerAcademic Publishers, 2003.

62