Delay Bound Optimization in NoC Using a Discrete Firefly Algorithm

Du, Gaoming; Tian, Chao; Li, Zhenmin; Zhang, Duoli; Zhang, Chuan; Wang, Xiaolei; Yin, Yongsheng

doi:10.3390/electronics8121507

Open AccessArticle

Delay Bound Optimization in NoC Using a Discrete Firefly Algorithm

by

Gaoming Du

^1,*

,

Chao Tian

¹,

Zhenmin Li

¹,

Duoli Zhang

¹,

Chuan Zhang

^2,3,

Xiaolei Wang

¹ and

Yongsheng Yin

¹

Institute of VLSI Design, Hefei University of Technology, Hefei 230601, China

²

National Mobile Communications Research Laboratory, Southeast University, Nanjing 211189, China

³

Purple Mountain Laboratories, Nanjing 211111, China

^*

Author to whom correspondence should be addressed.

Electronics 2019, 8(12), 1507; https://doi.org/10.3390/electronics8121507

Submission received: 25 October 2019 / Revised: 27 November 2019 / Accepted: 29 November 2019 / Published: 9 December 2019

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

The delay bound in system on chips (SoC) represents the worst-case traverse time of on-chip communication. In network on chip (NoC)-based SoC, optimizing the delay bound is challenging due to two aspects: (1) the delay bound is hard to obtain by traditional methods such as simulation; (2) the delay bound changes with the different application mappings. In this paper, we propose a delay bound optimization method using discrete firefly optimization algorithms (DBFA). First, we present a formal analytical delay bound model based on network calculus for both unipath and multipath routing in NoCs. We then set every flow in the application as the target flow and calculate the delay bound using the proposed model. Finally, we adopt firefly algorithm (FA) as the optimization method for minimizing the delay bound. We used industry patterns (video object plane decoder (VOPD), multiwindow display (MWD), etc.) to verify the effectiveness of delay bound optimization method. Experiments show that the proposed method is both effective and reliable, with a maximum optimization of 42.86%.

Keywords:

delay bound; network calculus; network on chip; firefly algorithm

1. Introduction

In modern system on chips (SoC) design, average [1,2,3] and worst-case [4,5,6,7,8,9,10,11,12,13,14,15] performance are two essential metrics for communication architecture. Quality-of-service (QoS) in network on chip (NoC) represents the worst-case traverse time of on-chip communications. In NoC based SoC, optimizing the delay bound is challenging. Formal approaches have been proposed for delay bound modeling [16,17,18,19,20,21]. Wang et al. [16] used signal flow chart and signal time measures to analyze the upper bound of transmission delay time. Ren et al. [17] analyzed the communication delay bound for individual flows in NoC using an improved asymmetric multichannel structure of a router. Lu et al. [19,20] modeled a classic input-queuing virtual channel router and a unified platform in Simulink based on xMAS to improve the accuracy of NoC performance analysis.

As an effective tool, network calculus has been applied in NoC performance analysis [21,22]. How to improve the tightness of a NoC performance model based on network calculus has become one of the important research directions. Saggio et al. [23] validated the effectiveness of network calculus for delay bound modeling. To improve the accuracy of a NoC delay bound, Zhao et al. [24] used simulated annealing (SA) to automatically calculate the simulation parameters. They also extended the method from delay bound to a backlog bound tightness study [25], where SA was replaced by an adaptive simulated annealing (ASA) algorithm, with higher efficiency and precision.

The works above show that heuristic approaches can be an effective tool for worst-case performance analysis and that motivated us to improve the delay bound tightness under different mappings. Other than optimizing specific configuration, we focused on optimizing the delay bound in specific application mapping. We observed that different mapping schemes have different delay bounds for NoC traffic flows. When the mapping scheme changes, the contention scenarios also change. However, delay bound optimization is challenging not only because such problems are hard, but also its model should be both accurate and fast enough for solution exploration.

The firefly algorithm (FA) was inspired by the behavior of fireflies flashing and was first put forward by Xin-She Yang [26]. In recent years, FA has been widely used to optimize NoC design in design spaces [27,28,29]. In this paper, we propose a delay bound optimization method using firefly algorithms (DBFA). Instead of improving the simulation parameter accuracy and analyzing network contention, DBFA attempts to find the optimal mapping scheme directly. The experimental results show that the mapping scheme determined by FA is closer to the optimal mapping scheme than discrete particle swarm optimization (DPSO) [30]. Our contribution is summarized as follows:

We modeled the worst-case delay bound in network on chip using network calculus. It is suitable for both unipath and multipath routing NoCs.
We adopted FA to optimize the end-to-end delay bound. FA shows high efficiency in many other fields, such as scheduling. To the best of our knowledge, this is the first work using FA for delay bound optimization.
We performed extensive experiments using both synthetic and industry patterns. We also integrated our delay bound model into DPSO, a state-of-the-art algorithm, for performance comparison.

2. Delay Bound Analysis Using Network Calculus

Simulation experiments [31] can easily obtain latency and communication costs, but it is difficult to obtain a delay bound. The network calculus fills the gap. In Figure 1,

R (t)

and

R^{*} (t)

represent the actual arrival curve and service curve of NoC, respectively. A linear function

α (t)

covers the maximum rate of

R (t)

;

β (t)

represents the minimum service rate of

R^{*} (t)

. Both are defined as follows:

α (t) = \{\begin{matrix} r t + b & t > 0 \\ 0 & o t h e r w i s e \end{matrix}

β (t) = \{\begin{matrix} R (t - T) & t > T \\ 0 & o t h e r w i s e \end{matrix}

where r is the sustainable arrival rate, b is the burstiness, R is the minimum service rate, and T is the maximum processing latency. In particular, R is usually greater than r to ensure no packet dropping in NoC. Delay bound

\bar{D} (t)

of a flow

f_{i}

is calculated by finding the greatest horizontal distance

h (α, β)

between its arrival curve

α (t)

and the system equivalent service curve

β (t)

. Hence the delay bound is derived as follows [9]:

\bar{D} (t) = h (α, β) = T + \frac{b}{R} .

(1)

To express more clearly, we show all the symbols in Table 1.

3. Delay Bound Optimization Using Discrete Firefly Algorithms

3.1. Problem Formulation

We first present a general mapping process through the following definitions to optimize the total communication cost.

To map an application characteristic graph

G (V, E)

to a NoC topology graph

P (U, F)

, the mapping function

m a p ()

should satisfy the following constraints:

v_{i} \in V \Rightarrow m a p (v_{i}) \in U;

(2)

v_{i} \neq v_{j} \Leftrightarrow m a p (v_{i}) \neq m a p (v_{j});

(3)

s i z e (V) \leq s i z e (U);

(4)

b_{i, j} \leq B_{i, j} .

(5)

The smaller the delay bound, the better the mapping results. In this paper, we assume the NoC topology is a mesh. Therefore, the delay bound is usually calculated by the following equations.

D e l a y B o u n d = m a x (D_{1}, D_{2}, . . ., D_{n}),

(6)

where

D_{n}

is delay bound of every target flow, whose calculation is shown in the rest of this section. Thus, the optimization target is

m i n D e l a y B o u n d

.

3.2. Delay Bound Model

We present the delay bound calculation model in Algorithm 1. The key idea of the delay bound analysis is obtaining the end to end equivalent service curve (ESC) by considering all kinds of resource sharing, shown in line 3–13, and summarized in four steps.

S t e p

1: Use the function ClassifyConFlow() to classify the flow contentions into a unified representation as

f_{[a, b]}

, where a and b, respectively, are the traffic injection and output router node. These flows could be a single flow or an aggregate flow consisting of several contention flows; i.e., both unipath and multipath routing are supported in this model.

S t e p

2: Calculate the arrive curve of

f_{[a, b]}

at node a.

S t e p

3: Calculate the equivalent service curve for the target flow and obtain the delay bound.

S t e p

4: Repeat Steps 1–3 to find the delay bound of every single flow and select the max value as the delay bound in the current mapping.

Algorithm 1 Delay bound calculation

1: Input $β_{i} (t) = R_{i} {(t - T_{i})}^{+}$ , $1 \leq i \leq X * Y$ , $α_{f_{j}} = r_{j} t + b_{j}$ , $1 \leq j \leq F l o w N u m$
2: Output $D_{max}$
3: for the flow $f_{k} \in {T a g F l o w S e t}$
4: {
5: $E S C (f_{k})$ //To calculate the Equivalent Service Curve of flow $f_{k}$ .
6: $N o d e S e t ()$ //find all nodes which $f_{k}$ has passed.
7: $C C l a s s S e t = C l a s s i f y C o n F l o w ()$ //Classify all the contention flows.
8: for $f_{[a, b]} \in C C l a s s S e t$
9: {
10: $A C S e t = A C (f_{[a, b]})$
11: if $C r o s s C o n t e n t i o n$
12: Cut all cross-contention flows //The treatments of cross contention situation.
13: $A C S e t = A C_C r o s s C o n (f_{[a, b]}, A C S e t)$ //Calculate the arrive curve of cut-flows and combine the same type flows.
14: else
15: $E S C = E S C N o C r o s s C o n (f_{[a, b]}, A C S e t)$ // The treatments of other contention situations.
16: }
17: $D e l a y (f_{k}, E S C)$ //obtain it delay bound.
18: }
19: $D_{max} = max (D e l a y (f_{k}, E S C))$ //Obtain the worst-delay-bound for one scheme.

3.3. Distance Calculation

The calculation of distance is as follows.

s_{m n} = \sum_{i = 0}^{T n u m} (| x_{i}^{m} - x_{i}^{n} | + | y_{i}^{m} - y_{i}^{n} |),

(7)

where

s_{m n}

is the Manhattan distance between firefly

X_{m}

and

X_{n}

,

(x_{i}^{m}, y_{i}^{m})

represents the coordinates of the mapping scheme of firefly m in task i. Accordingly,

(x_{i}^{n}, y_{i}^{n})

represents the coordinates of the mapping scheme of firefly n in task i. And

T_{n} u m

is the total number of tasks.

In order to make the distance and the absorption coefficient in the same magnitude, we perform min-max operation to normalize the distance between two fireflies.

S_{m n} = \frac{s_{m n} - s_{min}}{s_{max} - s_{min}},

(8)

where

s_{min}

is the minimum distance between two fireflies and

s_{max}

is the maximum distance for the target firefly. To make it more clear, take Figure 2 as an example. There are two fireflies

X_{m}

and

X_{n}

in Figure 2a,b, respectively.

For firefly

X_{m}

, task numbers 1, 3, and 5 are mapped to (4, 4)(node F), (3, 4)(node B), and (2, 4)(node 7), respectively. For

X_{n}

, the positions of the above three tasks are (2, 1)(node 4), (2, 2)(node 5), and (2, 3)(node 6), respectively. According to the Equation (7), we can calculate

s_{1} = | 4 - 2 | + | 4 - 1 | = 5

,

s_{3} = | 3 - 2 | + | 4 - 2 | = 3

, and

s_{5} = | 2 - 2 | + | 4 - 3 | = 1

. The positions of task numbers 0, 2, and 4 have not changed, so

s_{0}

,

s_{2}

and

s_{4}

are all 0. At last, we can calculate the distance between fireflies

X_{m}

and

X_{n}

as

s_{m n} = 0 + 5 + 0 + 3 + 0 + 1 = 9

.

Figure 2c shows the theoretical max distance for firefly

X_{m}

, which does not exist actually. The max distance is a ideal up-bound value, which is defined as mapping the current task to the theoretical farthest corner node; e.g., task 0 and 2 are mapping to node 15. In this example,

s_{max} = 6 + 5 + 5 + 5 + 5 + 6 = 32

. As a result, the distance between firefly

X_{m}

and

X_{n}

is

S_{m n} = \frac{9}{32} = 0.28125

.

3.4. Refreshing of Firefly Locations

The original firefly location refreshing formula [26] is as the following:

X_{i} (t + 1) = X_{i} (t) + β (r) \cdot (X_{j} (t) - X_{i} (t)) + α \cdot R a n d o m () .

(9)

In our approach, we rewrite the firefly refreshing formula as follows.

X_{i} (t + 1) = (1 - β (r)) \cdot X_{i} (t) + β (r) \cdot X_{j} (t) + α \cdot R a n d o m ()

(10)

Thus, this formula consists of the following two parts:

β movement. Fireflies refresh because of the attractiveness between any two fireflies; it is related to attractiveness $β (r)$ , so we call it $β$ movement.
α movement. Fireflies refresh because of the random movement; it is related to the maximal random step $α$ , so we call it $α$ movement.

Each firefly moves towards the brighter fireflies through

β

movement (the mapping scheme least delay bound). Later, each firefly moves randomly though

α

movement to find a better mapping scheme. The

α

movement rule is different between the non optimal firefly and the optimal firefly, so we call the former

α 1

movement and the latter

α 2

movement.

1.

α 1

movement

In order to learn the

α 1

movement of firefly

X_{i} (t_{i})

after

β

movement, we defined a set w to record the positions of elements which occupy different positions between

X_{i} (t_{i})

and

X_{j} (t_{j})

. Then we chose

α

number of positions from w as exchange positions, and exchanged each two elements with probability p to finish the

α 1

movement.

2.

α 2

movement

The

α 2

movement is used for preventing an optimal firefly from falling into a local optimum. It increases the exploring capability of DBFA. We chose

α

number of positions in local optimum mapping scheme and exchanged each two elements with probability q to finish the

α 2

movement. The probability q was randomly generated by a uniform distribution function.

3.5. Pseudo Code of Firefly Algorithm

The pseudo code of DBFA is shown in Algorithm 2. All the steps described above are covered in this algorithm; i.e., defining firefly, calculating distance, refreshing locations, and optimizing the delay bounds.

For algorithm complexity, if there are n fireflies in the colony, we obtain a local optimum firefly in every generation when executing Algorithm 2. For other fireflies, the calculation process of each firefly from line

5 - 14

would be carried out

(n - 1)

times. Thus the iterations of all fireflies is

{(n - 1)}^{2}

. After all, the complexity of the FA is

O ({(n - 1)}^{2})

. Therefore, the complexity of whole program of DBFA is

o (m {(n - 1)}^{2} {(\frac{j \times (j + 1)}{2})}^{l})

.

Algorithm 2 DBFA

1: Input The application characteristic graph $G_{f} (P E, E_{f})$ , NoC topology graph $D_{f} (R, P_{f})$ , fireflies colony number $F n u m$ ,
maximum iterations $G m a x$ , absorption coefficient $γ$ , maximum attractiveness $β_{0}$ , and maximal random step $α$ .
2: Output Global optimum firefly $X_{min}^{global}$
3: Set parameters to initialize fireflies
4: for all $G < G M a x$
5: {
6: $F i n d G l o b a l () = min M (D m a x)$ // Search for the global optimum firefly $X_{min}^{global}$ with a minimized delay bound.
7: for all $i < F n u m$
8: {
9: for all $i < F n u m$
10: {
11: $S = C o m p u t e D i s t a n c e (X_{i j})$ //Compute the distance between $X_{i} (t_{i})$ and $X_{j} (t_{j})$ .
12: if $\frac{1}{d e l a y_{i}} < \frac{1}{d e l a y_{j}} \times exp (- γ S_{i j})$ //Satisfy move condition that $X_{i} (t_{i})$ moves towards $X_{j} (t_{j})$ .
13: {
14: Compute the attractiveness between $X_{i} (t_{i})$ and $X_{j} (t_{j})$ .
15: $X_{i} (t_{i})$ // execute $β$ movement.
16: $X_{i} (t_{i})$ // execute $α 1$ movement.
17: }
18: }
19: }
20: $X_{min}^{global}$ // execute $α 2$ movement.
21: }

3.6. Example of the Delay Bound Optimizing Process

To further understand the movement procedure, still take Figure 2 as an example. We calculated the delay bound of firefly

X_{m}

in Section 3 as

{\bar{D}}_{m} = 1139.833

. Using the same method, we can also obtain the delay bound of firefly

X_{n}

{\bar{D}}_{n} = 366

. The distance between

X_{m}

and

X_{n}

is

S_{m n} = 0.3125

, so we assign

γ = 0.3

, which satisfies the moving condition

\frac{1}{1139.833} < \frac{1}{366} \times exp (- 0.3 \times 0.3125)

.

Therefore firefly

X_{m}

moves towards

X_{n}

.

The attractiveness between fireflies can be calculated with the following formula.

β (r) = β_{0} e^{- γ s^{2}} .

(11)

We define

β_{0} = 1

, so the attractiveness between

X_{m}

and

X_{n}

is

β = 1 \times e^{(- 0.3 \times {0.3125}^{2})} = 0.97

.

As Figure 3 shows, compared with the mapping scheme of

X_{n}

, there are three different positions in

X_{m}

(regardless the position with the value

- 1

). We can see that the first different value is 1, so we look for the location where 1 is in

X_{m}

. Use the probability

β

to change the value in the last one with value in the fifth number. Repeat this step until all the different positions are changed to the same. So far we have already finished the

β

movement.

The firefly

X_{m}

which has gone through the

β

movement continues to complete the

α 1

movement. This step makes sure the firefly will move towards a lighter firefly exactly (with a lower delay bound). We supposed firefly

X_{n}

was the local optimum firefly, so we made it move according to the

α 2

movement rules. As Figure 4 shows, we randomly chose

α

positions in local optimum firefly and randomly produced a change probability q. In this case, we supposed that for

q = 0.57

and

α = 3

, the three positions would be <1,5,8>, changing the values with probability

0.57

in turn. So far, we can finish the

α 2

movement.

4. Experiments and Results

We performed experiments for the following three purposes: (1) proving the effectiveness of DBFA in delay bound optimization, (2) comparing the results with state-of-the-art work DPSO, and (3) verifying the tightness of DBFA compared to a simulation.

4.1. Setting Up

We mapped some applications to a mesh-NoC to test the reliability of our method. The characteristic graphs of industry patterns [30] such as picture-in-picture(PIP) [32], multiwindow display (MWD) [33], 263DEC MP3DEC [34], MP3ENC MP3DEC [34], MP3ENC MP3DEC [34], video object plane decoder (VOPD) [35], and DVOPD [36] are as shown in Figure 5, Figure 6a, and Figure 7. The mesh-NoC scale was

4 \times 4

and the whole progress was simulated by

C + +

and run in the platform of

U b u n t u 12.04

. Experimental parameter settings are shown in Figure 2. The experimental parameters are shown in Table 2.

F n u m

represents the total number of fireflies in fireflies group and

G M a x

represents the maximum number of iterations.

γ

and

α

represent the absorption coefficient and the maximal random step, respectively.

4.2. Experiment Results

VOPD has 21 flows, which is the largest and most complex characteristic graph in this paper, and we take it as an example to prove that optimization performance of DBFA is more convincing. The mapping scheme using the RAND, DPSO, and DBFA methods is shown in Figure 6b–d, where circles represent tasks and rectangles represent network nodes. The mapping scheme obtained by the three methods has corresponding delays of 55, 46, and 42, respectively.

The results of minimum worst-case delay bound in every generation are shown in Figure 8. At first, the beginning the worst delay bound was 55 cycles; after 400-times optimization, the delay bound was 42, which reduced by

23 . 64 %

. Compared with DPSO, DBFA can avoid the situation of the algorithm falling into a local optimum. DBFA is designed for NoC, by introducing

α

movement and

β

movement to successfully identify and jump out of local optimal traps. This is an important reason why DBFA is more efficient than DPSO. Its delay bound for each flow is shown in Figure 9.

In order to enhance the comparison of the results, we have added DVOPD to the original six industry patterns. The scale of DVOPD is much larger than any other pattern. The experimental results are shown in Figure 10. Although the scale of NoC has been greatly expanded, DBFA still has stronger performance than DPSO.

For other applications, the optimized results of every flow are shown in Figure 11, Figure 12, Figure 13, Figure 14 and Figure 15 and the biggest delay is the delay bound. What needs special explanation is in application PIP: DBFA is almost unoptimized because the PIP contains eight tasks and eight cores, seven of which are the same. The application is so simple that there is little room for optimization.

4.3. Scalability Analysis

The seven industry patterns in the experiments can be divided into small-scale, medium-scale and large-scale applications. Specifically, the scale of NoC in PIP is only 4 × 2, which is the smallest among all patterns, and the scale of NoC in DVOPD is 8 × 4, which is the largest among all patterns. We calculated the delay bound of PIP in NoCs of different scales. The experimental results are shown in Figure 16. Experimental data shows that, although the scale of NoC varies greatly, the delay bound is almost unchanged. This shows that DBFA and DPSO have strong stability in delay bound optimization.

4.4. Running Time of CPU

We also studied the feasibility of the DBFA. We mapped these applications, varying the scales of NoC and the whole CPU (Intel i5-8400) running time from initialization, to get optimal mapping schemes, which are shown in Table 3, where with the increasing of flows, the running time of CPU would increase by a slight to moderate amount. DBFA optimization time reduced compared to DPSO, which shows that DBFA is more efficient at optimizing delay bounds.

4.5. The Comparison of Optimized and Simulation Results

In order to verify the validity of DBFA, we performed simulations and compared the optimized analytical results with the simulation results, the using application MWD. In simulations, we used Verilog to design a

4 \times 4

NoC; the global clock network was 50 MHz and each router node handled one flit with two cycles, so the maximum delay of data in the network was four cycles. The experimental results are shown in Figure 17.This figure proves the validity of DBFA for optimizing the delay bound. For several flows (flow 8 and flow 11), the difference between theoretical results and stimulation results was minor, proving the tightness of the analytical results too. It is also important to point out that for some flows, such as flow 5 and flow 10, there existed a big gap. This is partly because the simulation time and flow contention were not well explored during the simulation.

5. Conclusions

Optimizing a delay bound in NoC is both important and hard. When the application mapping changes, the contentions between flows also change, which result in a different delay bound. In this paper, we first derived an analytical model for end-to-end flows in NoC, which can automatically compute delay bound for the target flow, when given the specified mapping. Then, we proposed a firefly algorithm for application mapping, with the delay bound minimization as the optimization objection. We called this framework as DBFA. Experiments showed that the proposed DBFA can not only optimize the delay bound for a specified application, with an optimization rate up to 42.86%, but also has a fast running time and tight accuracy.

Author Contributions

Data curation, C.T.; Formal analysis, D.Z.; Funding acquisition, G.D.; Methodology, C.T. and Z.L.; Project administration, G.D.; Resources, D.Z.; Software, Z.L.; Supervision, G.D.; Validation, C.T.; Visualization, C.Z.; Writing—original draft, C.T.; Writing—review & editing, G.D., Z.L., D.Z., C.Z., X.W. and Y.Y.

Funding

This work was supported in part by the National Key Research and Development Program under grant 2018YFB2202604, the University Synergy Innovation Program of Anhui Province under grant GXXT-2019-030, NSFC under grants 61871115 and 61501116, and Jiangsu Provincial NSF for Excellent Young Scholars under grant BK20180059.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ogras, U.Y.; Bogdan, P.; Marculescu, R. An analytical approach for network-on-chip performance analysis. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2010, 29, 2001–2013. [Google Scholar] [CrossRef]
Bogdan, P. Workload characterization and its impact on multicore platform design. In Proceedings of the IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES + ISSS 2010), Scottsdale, AZ, USA, 24–29 October 2010; pp. 231–240. [Google Scholar] [CrossRef]
Wu, Y.; Min, G.; Ould-Khaoua, M.; Yin, H.; Wang, L. Analytical modelling of networks in multicomputer systems under bursty and batch arrival traffic. J. Supercomput. 2010, 51, 115–130. [Google Scholar] [CrossRef]
Azarnova, T.V.; Barkalov, S.A.; Ukhlova, V.V. Estimation of time characteristics of systems with network topology and stochastic processes of functioning. In Proceedings of the International Conference “Applied Mathematics, Computational Science and Mechanics: Current Problems” (AMCSM 2019), Athens, Greece, 28–30 December 2019; p. 1203. [Google Scholar] [CrossRef]
Kiasari, A.E.; Hessabi, S.; Sarbazi-Azad, H. PERMAP: A performance-aware mapping for application-specific SoCs. In Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors (ASAP 2008), Leuven, Belgium, 2–4 July 2008; pp. 73–78. [Google Scholar] [CrossRef]
Qian, Z.; Juan, D.; Bogdan, P.; Tsui, C.Y.; Marculescu, D.; Marculescu, R. A Support Vector Regression (SVR)-Based Latency Model for Network-on-Chip (NoC) Architectures (TCAD). IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2015, 35, 471–484. [Google Scholar] [CrossRef]
Thiele, L.; Chakraborty, S.; Naedele, M. Real-time calculus for scheduling hard real-time systems. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS 2000), Geneva, Switzerland, 28–31 May 2000; pp. IV-101–IV-104. [Google Scholar]
Leontyev, H.; Chakraborty, S.; Anderson, J.H. Multiprocessor extensions to real-time calculus. In Proceedings of the Real-Time Systems Symposium (RTSS 2009), Washington, DC, USA, 1–4 December 2009; pp. 410–421. [Google Scholar] [CrossRef] [Green Version]
Cruz, R.L. A calculus for network delay. I. Network elements in isolation. IEEE Trans. Inform. Theory 1991, 37, 114–131. [Google Scholar] [CrossRef]
Qian, Y.; Lu, Z.; Dou, W. Analysis of communication delay bounds for network on chips. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC 2009), Yokohama, Japan, 19–22 January 2009; pp. 7–12. [Google Scholar] [CrossRef]
Du, G.; Zhang, C.; Lu, Z.; Saggio, A.; Gao, M. Worst-case performance analysis of 2-D mesh NoCs using multi-path minimal routing. In Proceedings of the ACM International Conference on Hardware/Software-Codesign and System Synthesis (CODES + ISSS 2012), Tampere, Finland, 7–12 October 2012; pp. 123–132. [Google Scholar] [CrossRef]
Giroudot, F.; Mifdaoui, A. Buffer-aware worst-case timing analysis of wormhole NoCs using network calculus. In Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2018), Porto, Portugal, 11–13 April 2018; pp. 37–48. [Google Scholar] [CrossRef] [Green Version]
Jafari, F.; Lu, Z.; Jantsch, A. Least Upper Delay Bound for VBR Flows in Networks-on-Chip with Virtual Channels. ACM Trans. Des. Autom. Electron. Syst. 2015, 20. [Google Scholar] [CrossRef] [Green Version]
Jiang, Y.; Liu, Y. Stochastic Network Calculus; Springer: London, UK, 2009. [Google Scholar]
Du, G.; Liu, G.; Zhang, Y. On the accuracy of stochastic delay bound for network on chip. In Proceedings of the IEEE/ACM International Symposium on Networks-on-Chip (NOCS 2017), Seoul, Korea, 19–20 October 2017. [Google Scholar] [CrossRef]
Wang, J.; Zhang, Y.; Huang, H. Analysis and compute of real-time signal flow delay for network on-chip. In Proceedings of the International Conference on Innovative Computing and Cloud Computing (ICCC 2011), Wuhan, China, 13–14 August 2011; pp. 107–112. [Google Scholar] [CrossRef]
Ren, X.; Gao, D.; Fan, X.; An, J. Analysis of delay bounds for NoC based on improved asymmetric multi-channel router. J. Jilin Univ. 2014, 44, 782–787. [Google Scholar] [CrossRef]
Ayed, H.; Ermont, J.; Scharbarg, J.; Fraboul, C. Towards a unified approach for worst-case analysis of Tilera-like and KalRay-like NoC architectures. In Proceedings of the IEEE International Workshop on Factory Communication Systems Proceedings (WFCS 2016), Aveiro, Portugal, 3–6 May 2016. [Google Scholar] [CrossRef] [Green Version]
Lu, Z.; Zhao, X. xMAS-based qos analysis methodology. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2018, 37, 364–377. [Google Scholar] [CrossRef]
Zhao, X.; Lu, Z. A Tool for xMAS-Based modeling and analysis of communication fabrics in Simulink. ACM Trans. Model. Comput. Simul. 2017, 27. [Google Scholar] [CrossRef]
Qian, Y.; Lu, Z.; Dou, W. Analysis of worst-case delay bounds for on-chip packet-switching networks. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2010, 29, 802–815. [Google Scholar] [CrossRef] [Green Version]
Long, Y.; Lu, Z.; Shen, H. Composable Worst-Case Delay Bound Analysis Using Network Calculus. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2018, 37, 705–709. [Google Scholar] [CrossRef]
Saggio, A.; Du, G.; Zhao, X.; Lu, Z. Validating Delay Bounds in Networks on Chip: Tightness and Pitfalls. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI, Montpellier, France, 8–10 July 2015; pp. 404–409. [Google Scholar] [CrossRef]
Zhao, X.; Lu, Z. Empowering study of delay bound tightness with simulated annealing. In Proceedings of the Design, Automation and Test in Europe (DATE 2014), Dresden, Germany, 24–28 March 2014. [Google Scholar] [CrossRef]
Zhao, X.; Lu, Z. Heuristics-aided tightness evaluation of analytical bounds in networks-on-chip. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2015, 34, 986–999. [Google Scholar] [CrossRef]
Yang, X. Firefly Algorithms for Multimodal Optimization. Lect. Notes Comput. Sci. 2009, 5729, 169–178. [Google Scholar] [CrossRef] [Green Version]
Umamaheswari, S.; Kirthiga, K.I.; Abinaya, B.S.; Ashwin, D. Cost aware task scheduling and core mapping on Network-on-Chip topology using Firefly algorithm. In Proceedings of the International Conference on Recent Trends in Information Technology (ICRTIT 2013), Chennai, India, 25–27 July 2013; pp. 657–662. [Google Scholar] [CrossRef]
Gandomi, A.H.; Yang, X.; Alavi, A. HosseinMixed variable structural optimization using Firefly Algorithm. Comput. Struct. 2011, 89, 2325–2336. [Google Scholar] [CrossRef]
Ilamathi, K.; Rangarajan, P. Determining Effective Shortest Path in Asynchronous Network-on-Chip through Bio-Inspired Optimization Techniques. Wirel. Pers. Commun. 2018, 102, 3375–3392. [Google Scholar] [CrossRef]
Sahu, P.K.; Manna, K.; Chattopadhyay, S. Application Mapping onto Butterfly-Fat-Tree based Network-on-Chip using Discrete Particle Swarm Optimization. Int. J. Comput. Sci. Appl. 2015, 115, 13–22. [Google Scholar]
Garcia, M.L.G.; Aedo, C.J.E.; Bagherzadeh, N. A new approach to the Population-Based Incremental Learning algorithm using virtual regions for task mapping on NoCs. J. Syst. Architect. 2019, 97, 443–454. [Google Scholar] [CrossRef]
Bertozzi, D.; Jalabert, A.; Murali, S.; Tamhankar, R.; Stergiou, S.; Benini, L.; De Micheli, G. NoC synthesis flow for customized domain specific multiprocessor systems-on-chip. IEEE Trans. Parallel Distrib. Syst. 2009, 16, 113–129. [Google Scholar] [CrossRef] [Green Version]
Chang, K.-C.; Chen, T.-F. Low-power algorithm for automatic topology generation for application-specific networks on chips. IET Comput. Digit. Techn. 2008, 2, 239–249. [Google Scholar] [CrossRef]
Krishnan, S.; Karam, S.C.; Goran, K. Linear Programming based Techniques for Synthesis of Network-on-Chip Architectures. IEEE Trans. Very Large Scale Integr. (VlLSI) Syst. 2006, 14, 407–420. [Google Scholar] [CrossRef] [Green Version]
Murali, S.; De Micheli, G. Bandwidth-constrained mapping of cores onto NoC architectures. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE 2004), Paris, France, 16–20 Feburary 2004; pp. 896–901. [Google Scholar] [CrossRef] [Green Version]
Concer, N.; Bononi, L.; Soulie, M. The connection-then-credit flow control protocol for heterogeneous multicore systems-on-chip. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2010, 869–882. [Google Scholar] [CrossRef]

Figure 1. The concept and example of delay bound in network calculus.

Figure 2. Distance calculation between fireflies.

Figure 3.

β

movement.

Figure 3.

β

movement.

Figure 4.

α 2

movement.

Figure 4.

α 2

movement.

Figure 5. The characteristic graph of industry patterns.

Figure 6. Application mapping example of VOPD using RAND, DPSO, and DBFA.

Figure 7. The characteristic graph of DVOPD.

Figure 8. The worst-case delay bound of local optimum and global optimum.

Figure 9. The delay bound for each flow of VOPD.

Figure 10. The delay bound for each flow of DVOPD.

Figure 11. 263DEC MP3DEC.

Figure 12. 263ENC MP3DEC.

Figure 13. MP3ENC MP3DEC.

Figure 14. MWD.

Figure 15. PIP.

Figure 16. Comparison of delay bounds of PIP in networks on chips (NoCs) of different scales.

Figure 17. Simulation experiments.

Table 1. Symbols and Definitions.

Symbol	Definitions
$f [i, j]$	The flow injected at router i and ejected at router j
$α (i, j)$	The arrival curve of f[i, j] at ingress node i
$β_{i}$	The service curve of router i
$β_{e q}^{f_{(i, j)}}$	The equivalent service curve of flow f(i, j)
$a^{+}$	if $a > 0$ , $a^{+} = a$ , otherwise, $a^{+} = 0$
$h (α, β)$	The function to compute the maximum horizontal distance between the arrival curve and service curve
$\in (., .)$	The function to compute the equivalent service curve
$inf {a, b}$	To obtain the minimize value between a and b

Table 2. DBFA parameter settings.

Applications	$Vertexes$	$Edges$	$Fnum$	$GMax$	$γ$	$α$	NoC Mesh
PIP	8	8	10	300	0.29	3	4 × 2
VOPD	16	21	20	400	0.3	4	4 × 4
MWD	12	12	30	600	0.3	5	4 × 4
263ENC MP3DEC	12	12	25	800	0.2	3	4 × 4
MP3ENC MP3DEC	13	13	25	800	0.18	3	4 × 4
263DEC MP3DEC	14	15	25	800	0.25	3	4 × 4
DVOPD	32	44	20	6000	0.28	6	8 × 4

Table 3. CPU running times.

Applications	IJCA’15 [30] $^{a}$	DBFA	Applications	IJCA’15 [30]	DBFA
VOPD	116.45	97.20	263ENC MP3DEC	24.75	17.70
PIP	8.92	6.65	MP3ENC MP3DEC	41.28	32.98
MWD	21.06	13.39	263DEC MP3DEC	53.16	40.27

^a In order to compare the results under the same environment, we implemented the DPSO algorithm proposed in [30].

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Du, G.; Tian, C.; Li, Z.; Zhang, D.; Zhang, C.; Wang, X.; Yin, Y. Delay Bound Optimization in NoC Using a Discrete Firefly Algorithm. Electronics 2019, 8, 1507. https://doi.org/10.3390/electronics8121507

AMA Style

Du G, Tian C, Li Z, Zhang D, Zhang C, Wang X, Yin Y. Delay Bound Optimization in NoC Using a Discrete Firefly Algorithm. Electronics. 2019; 8(12):1507. https://doi.org/10.3390/electronics8121507

Chicago/Turabian Style

Du, Gaoming, Chao Tian, Zhenmin Li, Duoli Zhang, Chuan Zhang, Xiaolei Wang, and Yongsheng Yin. 2019. "Delay Bound Optimization in NoC Using a Discrete Firefly Algorithm" Electronics 8, no. 12: 1507. https://doi.org/10.3390/electronics8121507

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Delay Bound Optimization in NoC Using a Discrete Firefly Algorithm

Abstract

1. Introduction

2. Delay Bound Analysis Using Network Calculus

3. Delay Bound Optimization Using Discrete Firefly Algorithms

3.1. Problem Formulation

3.2. Delay Bound Model

3.3. Distance Calculation

3.4. Refreshing of Firefly Locations

3.5. Pseudo Code of Firefly Algorithm

3.6. Example of the Delay Bound Optimizing Process

4. Experiments and Results

4.1. Setting Up

4.2. Experiment Results

4.3. Scalability Analysis

4.4. Running Time of CPU

4.5. The Comparison of Optimized and Simulation Results

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI