Making Path Selection Bright: A Routing Algorithm for On-Chip Benes Networks

Zhao, Li; Li, Zhiwei; Ma, Tianming

doi:10.3390/electronics13050981

Open AccessArticle

Making Path Selection Bright: A Routing Algorithm for On-Chip Benes Networks

by

Li Zhao

^1,2,*

,

Zhiwei Li

¹ and

Tianming Ma

¹

School of Electronic and Electrical Engineering Shanghai, Songjiang Campus, Shanghai University of Engineering Science, Shanghai 201620, China

²

Anhui Zhiguo Intelligent Technology Company Ltd., Hefei 239000, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(5), 981; https://doi.org/10.3390/electronics13050981

Submission received: 18 February 2024 / Revised: 29 February 2024 / Accepted: 29 February 2024 / Published: 4 March 2024

(This article belongs to the Special Issue Recent Advances in Millimeter-Wave Components and Integrated Technologies)

Download

Browse Figures

Versions Notes

Abstract

:

Optical interconnects are being discussed as a replacement for conventional electrical interconnects and are expected to be applied for future generations of high-performance supercomputers and data centers. Benes networks have attracted much attention because they require only 2 × 2 optical switches, which reduce the cost of rearrangeable nonblocking. However, optical power imbalances can significantly challenge receiver sensitivity. In this work, insertion loss (IL) fairness has been proposed and applied to the field of switches to achieve a relative balance of optical path data transmission in Benes networks. Fairness can be achieved when the port count is small (4 × 4) if the IL between ports is balanced. When the number of ports is moderate (8 × 8), we must use a suitable algorithm or determine the appropriate operating wavelength to minimize the power imbalance. An efficient two-step algorithm (ETS) has particular advantages in solving the path fairness problem and mitigating the power imbalance. As the number of ports increases, the switch states and topology jointly deteriorate the power imbalance. Finally, the ETS algorithm narrows the dynamic range requirement to 13.66 dB, with a 2 dB improvement. It achieves an extinction ratio of 24 dB and a bandwidth of 375 GHz, which outperforms the conventional 32 × 32 Benes network, respectively.

Keywords:

routing algorithm; ring resonator; Benes network; path fairness; extinction ratio

1. Introduction

Driven by cloud computing and ever-increasing cloud content, data center (DC) networks are experiencing a surge in traffic, with an estimated 77% of total switched traffic by 2030 [1,2,3,4,5]. Like DCs, high-performance supercomputers (HPCs), such as Blue Gene/Q, PERCS/Power 775, and P7/IH, have widely used optical modules. Silicon-integrated optical switches, which benefit from CMOS compatibility and mature fabrication compared with their electronic counterparts, have attracted widespread interest. Optical interconnects [6,7,8] are being discussed as a replacement for conventional electrical interconnects and are expected to be applied for future generations of HPC and DC.

Recently, we demonstrated the first-ever bi-Benes architecture based on 32 × 32 reconfigurable nonblocking switches, which allowed us to suppress the power consumption to as low as 85

μ

W [3,4]. The Benes network has several favorable qualities. It employs only 2 × 2 switching elements. Low-port-count optical switches form a multilayer optical network that can expand to a larger scale. It enables nonblocking communication between any pair of idle ports. One can reroute the existing links to accommodate the new requests. In this context, designing an intelligent routing table for Benes has been a hot topic in recent years in both academia and industry [3,4,9,10,11,12,13].

Previous work focused on keeping crosstalk (XTL) and IL low. IBM found that the power imbalance of individual switches can lead to XTL [14]. They predicted that total XTL in the optical path would be reduced by 7 dB using a push–pull rather than a single-ended phase shifter drive scheme. Research at Columbia University has shown that the leading cause of XTL is not higher-order XTL but first-order XTL [15]. For this reason, an intelligent path mapping selection strategy that avoids first-order XTL shows a 16 dB improvement in the worst-case path power penalty. The University of Cambridge used a dilated Benes scheme to suppress XTL [16]. It employs Mach–Zehnder interferometers as switching gates and semiconductor optical amplifiers to compensate for passive losses or suppress XTL to −47 dB at the cost of increased power consumption. Subsequently, the research focused on reducing power consumption. In addition, the optical power dynamic range (PDR) should be as narrow as possible to ensure that the receiver’s sensitivity is sufficient for coverage. The same team at Columbia University iterated over all possible solutions and used a fixed routing table [17]. With cascaded switches, the 16 × 16 port network showed a PDR of 15 dB with a penalty of fewer than 1 dB. A general penalty-optimized routing approach selects the most favorable switch configuration. However, its computational requirements increase as the number of ports increases. In this case, no study has been conducted to derive an optimal routing table for a 32 × 32 Benes network.

Optical power imbalances can pose a significant challenge to receiver sensitivity. A compromise between IL and XTL is usually required to mitigate power imbalance between ports. However, both IL and XTL are highly susceptible to path imbalance. To address this problem, we propose a novel path selection process with automatic routing and an ETS in Benes networks. The proposed scheme addresses three problems: (1) how to mitigate the power imbalance of four-port Benes efficiently; (2) how to reduce the PDR on an eight-port scale efficiently; and (3) how to efficiently route the data on a 32 × 32 Benes network with two objectives: (1) maximize the extinction ratio (ER) and (2) minimize the first-order XTL. Our contributions can be summarized as follows:

The concept of IL fairness is applied to Benes networks. When fewer ports exist, the optimal routing algorithm (ORA) is proposed to reduce the imbalance at the receiver.
As the number of ports increases, a greedy algorithm (GRA) is proposed to suppress the PDR, which allows the signal to propagate under one and only one first-order XTL.
A cooperative ETS algorithm combining ORA and GRA improves ER performance by balancing maximum IL and XTL of the longest path.

The rest of this paper is organized as follows. In Section 2, we detail the ETS algorithm and its integer model. Section 3 studies the effect of the ETS algorithm on power imbalances and PDRs. In particular, the emphasis is put on ILs, XTLs, and ERs. Finally, Section 4 concludes the paper.

2. Network Topology

We selected the Benes network architecture, characterized by multiple paths between input and output pairs. Classical architectures like the Omega network have precisely one route between each node pair [18]. In the Benes and Bi-Benes networks, this problem is negligible. The main advantage of this scheme is that it needs only 2 × 2 optical switches. Thus, it is beneficial to use the Benes network to reduce the costs of rearrangeable nonblocking [19].

The topology determines the number of switching elements. The Benes architecture can be roughly divided into two parts: the first half of the Omega network, including the middle column, and the second half of the reverse Omega network [20]. In general, N × N Benes has 2

l o g_{2} N

− 1 stages, each consisting of N/2 switches. In each switch, there are two ring resonators. Consequently, the switch has 2

N l o g_{2} N

− N elements.

The routing table scales to a large number of port counts. Take 32 ports as an example; the Benes network needs N

l o g_{2} N

− N/2 switches and results in

2^{N l o g_{2} N - N / 2}

= 2.238

\times 10^{43}

possibilities or matchings. In a nonblocking manner, there are N! = 32! = 2.63

\times 10^{35}

permutations to realize between its inputs and outputs. If the N/2 × N/2 subnetworks are deterministic in the middle stage, the complexity of the routing table is reduced to (N/2!) ×

2^{N}

= 8.98

\times 10^{22}

.

If the N/2 × N/2 subnetwork is deterministic, this does not mean that all steering does not change internally. The so-called determinism indicates that the upper and lower halves of the central modules look up the same routing table. Although such an assumption is simplistic, it suffices for our present purpose to reduce the time complexity.

2.1. Problem Statement

The problem is extracting the 8.98

\times 10^{22}

routing table out of all 2.238

\times 10^{43}

possibilities. Unlike the conventional algorithm, which employs the looping algorithm [21] to select the routing paths at random, this algorithm uses the innovative ETS algorithm to improve the key indicators. This approach improves the performance by using the path’s characteristic in the optimal routing table and consequently has the potential to maximize the ER while minimizing the XTLs.

To analyze the physical indicators, we introduce a feasible region as in Figure 1. As the figure indicates, the x-axis indicates actual wavelengths, while the y-axis indicates the transmission penalty. The figure zooms a more miniature spectral transmission, highlighting the redundant information due to its periodic nature. The highest and lowest output power corresponds to IL and XTL, respectively. The width and height indicate bandwidth (BW) and the ER.

In previous works [9,10,11,12], efforts were devoted to decreasing IL and affecting XTL low; see Figure 1A. The suggested IL does not always lead to a low XTL. Thus, these two targets are achieved separately. After adopting the ETS algorithm, the ER and BW can be simultaneously enhanced; see Figure 1B.

2.2. Mathematical Model

2.2.1. Model Parameters

We model the switch allocation problem as an integer problem [22]. To start with, the input ports and the output ports are numbered as {

I_{1}

,

I_{2}

, …,

I_{j}

, …,

I_{N}

}, {

O_{1}

,

O_{2}

, …,

O_{j}

, …,

O_{N}

} from top to bottom. The number of the network stage is

M = 2 l o g_{2} N - 1

. Finally,

\hat{S}

denotes the columns of the outputs of each intermediate stage.

Two types of interconnections are needed: (1) The fixed links, i.e., those that connect to the switch output to the next stage in a fixed manner. (2) The reconfigurable links, i.e., the internal structure of the switching elements. Herein,

\hat{M}

denotes the columns of the solid links, and

\hat{A}

denotes the columns of the reconfigurable links.

We use an integer i = 1, …, M to denote the intermediate variables from one stage to the next; the terms are numbered from left to right. By means of the integer variables

j, k

= 1, …,

N / 2

, which denote the intermediate terms of a given stage, it is possible to view the above terms from top to bottom. For any integer j, let k denote an integer different from j.

{\hat{M}}_{i}

denotes the columns of the solid links. In particular, it collects switches

m_{i, j, k}

in each column to form a matrix-like structure. A binary

m_{i, j, k}

represents whether the port j is connected to the port k in the

i^{t h}

stage. When the value takes zero, port j is unconnected to port k. Otherwise, it is connected.

\hat{A}

denotes the columns of the reconfigurable links. In other words,

\hat{A}

collects switches

a_{i, k}

in each column to form a matrix-like structure. For each request, the central controller schedules the switches based on the routing table. When the value takes zero, the switch

a_{i, k}

is in the bar state. Otherwise, it is in the cross-state.

Note that

{\hat{A}}_{i}

collects a set of switches in the

i^{t h}

stage with new index numbers from 1 to

N / 2

.

{\hat{A}}_{i}

is twice the number of states in that stage. For example, if the states of a four-port switch are

{0, 1}

, then

{\hat{A}}_{1}

=

{1, 2, 4, 3}

due to the presence of reconfigurable links.

Define

()

as the matrix operator that returns index numbers from inputs to outputs. Assume

{\hat{A}}_{1}

=

{1, 2, 4, 3}

and I =

{1, 2, 3, 4}

. A matrix = [1, 2, 4, 3; 1, 2, 3, 4] is formed by column vectors

[{\hat{A}}_{1};

I

]^{T}

. Sort

{\hat{A}}_{1}

in ascending order. Then, the matrix becomes [1, 2, 3, 4; 1, 2, 4, 3]. The second row represents

{\hat{A}}_{i} (I)

. Therefore, the input I =

{1, 2, 3, 4}

can be mapped to the output

{\hat{A}}_{i} (I)

=

{1, 2, 4, 3}

.

{\hat{S}}_{i}

denotes the columns of ports of the middle stage switches. As before, it collects switches

s_{i, j}

in each column to form a matrix-like structure. The value of

s_{i, j}

represents the output port of the middle stage, which is used to monitor the output signal from different inputs. More generally, the

{\hat{S}}_{M}

of the final stage is necessary from the outset to represent the destinations.

It is worth noting that in this problem, the fixed links M and traffic request

I, O

are known. However,

\hat{A}

is unknown yet, and the performance of

{\hat{S}}_{i}

cannot be directly evaluated. For this reason, we need two auxiliary graphs.

2.2.2. Auxiliary Graphs

As usual, the first step is to calculate the routing tables in the traditional manner [3,4]. To provide maximum BW, we previously demonstrated an auxiliary graph to establish connections between all input and output ports. It turns out that a Benes network is nonblocking if there are no unreachable ports. As the intermediate stages are developed for such purpose, the final stage of it should be identical to the outputs. Equation (1) can solve this problem. Since their sum is always constant, the difference should be either a sum of squares or an absolute value.

Reformulate the problem of routing a set of connections through a Benes network as a graph edge-coloring problem. The graph

G_{1}

determines the set of routes has one vertex for every first-stage switch and one vertex for every last-stage switch. The edges of the graph are added for every connection. The colors black and red correspond to the upper or lower half of the switch between a vertex in the first set and a vertex in the second set. Given this graph, the next step is to assign colors to the edges so that no two edges incident to the same stage switch are given the same color. Then, using graph

G_{1}

recursively, one can exhaust all routing tables to obtain R = [

a_{11}

,

a_{12}

, …,

a_{N M}]

of the switching states. The recursion process is described in Equation (2).

In particular, we use graph

G_{2}

to evaluate the performance of the routing tables directly, and compared with the traditional MLSE method [17], a perfect matching graph is found by the balanced matrix [23]. For each matching R in the routing table, we construct an auxiliary graph

G_{2}

to transform the switching state into a route label

{\hat{a}}_{i, k}

. Likewise, this element can only take either 0 or 1. If the value is zero, the connection routes to the upper port of the switch and, if one, to the lower port.

Take an eight-port Benes network as an example. The first half of the auxiliary graph {

c_{1}

,

c_{2}

} is much more important than the second half {

b_{1}

,

b_{2}

,

b_{3}

}. The first half of the information represents the features. The second half of the label represents the address information. Therefore, for the same matching R, the second half is the same and uninformative.

In this paper, we use the first half of the route label

{\hat{a}}_{i, k}

as the metric because the first half is more significant in terms of features. The logic behind this is that if the columns in the first half match the features of {01010101, 00110011} more closely, it will be recognized as the best match. More columns can be characterized similarly. The best match in the routing table will be selected, and a path will be created. In specific, we will give examples of the auxiliary graphs of

G_{1}

and

G_{2}

in Section 3.2.

In summary, we use graph

G_{1}

recursively to enumerate the switching states of R = [

a_{11}

,

a_{12}

, …,

a_{N M}]

to realize a given arbitrary permutation. Although there are

2^{x}

possible choices for the same permutation request, a better configuration could be found through the auxiliary graph

G_{2}

. The ideal matching exists if

G_{2}

satisfies the specific constraints of Equations (3) and (4). Otherwise, they are called additional matchings. Later, near-optimal matchings can be found if

G_{2}

is subject to the GRA constraint; see Equations (5) and (6). Figure 2 illustrates the process using auxiliary graphs of

G_{1}

and

G_{2}

.

2.2.3. MODEL and Constraints

End-to-end routing is feasible if the input permutation has the same address as the network input port to which it is connected. The problem then can be formulated as follows:

\begin{matrix} m i n \sum_{j = 1}^{N} {|O_{j} - {\hat{S}}_{M, j}|}^{2} \end{matrix}

(1)

\begin{matrix} \hat{S_{i}} = \{\begin{matrix} {\hat{M}}_{i} ({\hat{A}}_{i} (I)), & i = 1 \\ {\hat{M}}_{i} ({\hat{A}}_{i} ({\hat{S}}_{i - 1})), & i = 2, \dots, M - 1 \\ {\hat{A}}_{i} ({\hat{S}}_{i - 1}), & i = M \end{matrix} \end{matrix}

(2)

\begin{matrix} {\hat{a}}_{i, j} = {\hat{a}}_{i, j + τ} + 1, & j = 1, \dots, (M - 1) / 2 \end{matrix}

(3)

\begin{matrix} τ = 2^{j}, \end{matrix}

(4)

\begin{matrix} m a x \sum_{i, j} a_{i, j} \end{matrix}

(5)

\begin{matrix} max |\sum_{i = 1}^{(M - 1) / 2} a_{i, j}| - |\sum_{i = (M + 1) / 2}^{M} a_{i, j}| \end{matrix}

(6)

2.3. Algorithm and Constraints

The algorithm includes the initialization phase, optimal phase, and near-optimal phase; see Figure 2. When the optimization result cannot be found, the near-optimal algorithm is usually used. In the initialization phase, a brute-force method could search over all possible switch states for an input/output permutation that achieves a complete connection that minimizes the objective Equation (1) in a reasonable amount of time. Each node had a single optical port and could serve only one flow at any instant, modeled by constraint (2). Once the judge receives the switch states from the register, it divides a single matrix into two parallel parts so that the switch elements can be executed simultaneously: the first half

{a_{i, j}}

, where i = 1, …,

l o g_{2} N

; and the second half

{a_{i, j}}

, where i =

l o g_{2} N

+ 1, …, 2

l o g_{2} N

− 1, separately.

ORA Solution: Finding an optimal solution could be of high time complexity and hence prohibitive for real-time applications. Here, we verify the existence by Equations (3) and (4), which have lower computation costs and do not require global network information. The integer program above is a variant of the 2-coloring problem. The integer program can be solved optimally by finding the same vector, V. In the next section (Section 3.1), we show that the maximum number of the optimal solution can be minimal, amounting to only 5% of the total permutations. Therefore, we also use GRA to find the solutions efficiently.

GRA Solution: Only nonoptimal solutions are forwarded to the second stage, which resolves the dynamic range or XTL level issue based on a maximum bar-state target. Constraint (5) checks whether the signals have large dynamic ranges. Besides that, constraint (6) can further adjust lower XTL and improve the performance when the same dynamic range is noticed. The greedy approach is faster but near optimal.

3. Numerical Results

To verify the superiority of the ETS algorithm, we have performed the following simulations based on Matlab 6.0 and INTERCONNECT 14.1. We used the Windows 11 platform with a Mechrev laptop and Intel i9 core. To increase the reliability, we simulated the 2 × 2 structure using both an analytical model and simulation. Matlab software 2017a is responsible for the analytical results, and the device-level simulation is implemented using Lumerical 2020 R2. Based on this, we assessed the scalability by building Benes networks of sizes 4, 8, and 32, namely (a) a simulation of 4-port Benes for all paths, (b) a simulation of 8-port Benes for randomly selected permutations, and (c) a simulation of 32-port Benes for two extremes: all bar states and all across states.

As long as the number of stages is equal in a Benes network, switching elements from all inputs to the outputs is constant. Ideally, all optical paths on all input-to-output ports are the same. However, the optical paths may depend heavily on the number of crossing waveguides and bar-state switches. And they should be considered accordingly. Those with the maximum crossing waveguides and bar states are considered the longest path. In contrast, those connections with the minimum number of crossing waveguides and the minimum number of bar states are considered the shortest path.

All bars and crosses are two extremes that are widely discussed in the previous literature [22]. Other switching states can be a tradeoff between these two factors. Therefore, in our work, we take these two factors impacting ILes and XTL levels into consideration in order to obtain a reasonable tradeoff.

3.1. Two-Port Switching Element

Figure 3a illustrates the switching mechanism of a 2 × 2 resonator-based switch. It comprises two silicon resonator-based switches coupled to a waveguide crossing, where two resonators are placed in the quadrant. It uses the electro-optic effect and has a similar design to the one in [24,25]. In this way, two receivers can share a waveguide in parallel to switch to the appropriate receiver. The method is preferable with a low IL of 1.5 dB and an XTL of 30 dB.

T_{o n 11} (θ) = \frac{O_{1}}{I_{1}} = \frac{κ_{2} κ_{3} A^{0.25} exp (i 0.25 θ)}{1 - t_{2} t_{3} A exp (i θ)}

(7)

T_{o n 12} (θ) = \frac{O_{2}}{I_{1}} = \frac{t_{2} - t_{2}^{2} t_{3} A exp (i θ) + κ_{2}^{2} t_{3} A exp (i θ)}{1 - t_{2} t_{3} A exp (i θ)}

(8)

T_{o f f_{11}} (θ + Δ θ) = \frac{O_{1}}{I_{1}} = \frac{κ_{0} κ_{1} A^{0.25} exp (i 0.25 θ + i 0.25 Δ θ)}{1 - t_{0} t_{1} A exp (i θ + i Δ θ)}

(9)

\begin{matrix} T_{o f f 21} (θ + Δ θ) = \frac{O_{1}}{I_{2}} = \frac{t_{0} - t_{0}^{2} t_{1} A exp (i θ + i Δ θ)}{1 - t_{0} t_{1} A exp (i θ + i Δ θ)} + \\ \frac{κ_{0}^{2} t_{1} A exp (i θ + i Δ θ)}{1 - t_{0} t_{1} A exp (i θ + i Δ θ)} \end{matrix}

(10)

The device transmission was calculated as a function of the injected-carrier density, with the assumption that can be described as follows: the optical fields at two input and output ports were denoted as

I_{1}

,

I_{2}

,

O_{1}

,

O_{2}

, where

T_{O N}

,

T_{O F F}

were defined as the transmission coefficients at the ON state and OFF state, respectively. Similarly,

t_{i}

and

k_{i}

were the self-coupling and cross-coupling coefficients, assuming

t_{i}^{2}

+

k_{i}^{2} = 1

, where i = 1, …, 4. In the ideal case, there were no losses.

Figure 4 demonstrates that the spectral properties of the

Δ θ = π

− shifted switch were Lorenz and inverse Lorenz curves [25]. We can observe that the resonance wavelengths correspond to the peaks. The distribution of the model state is represented in Equations (7)–(10). They follow the same principle depicted in [26].

The simulation setup is shown in Table 1. It shows a free spectral range of 26 nm. For the most accurate measurements, the operating wavelength of

λ_{c}

is considered, e.g., 1549.99 nm. In the subsequent work, one of the working wavelengths is used to analyze the signal.

Microrings cannot distinguish a particular resonant wavelength from others. Thus, the switch cannot be set to bar- or cross-state separately for each wavelength. The output port must be maximized at a specific resonant wavelength and minimized at other nonresonant wavelengths. Indeed, the optimization process is performed separately for each set of connections for each wavelength. In this design, the operating wavelength of the signals should be close to a particular resonant wavelength of the microrings. Hence, more than one MRR for each wavelength is beyond the scope of our discussion.

3.2. ORA Scheme

The first experiment explored how to mitigate the power imbalance of four-port Benes efficiently. Before proceeding, we need to introduce an ORA scheme. With this routing table scheme, we can see four matchings from top to bottom. In this configuration, spectral measurements were obtained in the same manner as [4,26]. The optical IL and XTL simulations are presented in Figure 5.

For a given permutation {2134}, notice that two auxiliary graphs were used. In

G_{2}

, these four cases can be represented as {0101; 0011; 1001}, {1010; 0011; 1001}, {0110; 0011; 1001}, {1001; 0011; 1001}, respectively. Thus, the last two columns were the same due to the destination being the same. Suppose the first column satisfies the {0101} pattern; it is optimal.

{Blue, Brown, Yellow, and Purple} represent signals from ports {1, 2, 3, and 4}, respectively. Using the above definitions, we demonstrate all signals and XTL paths in one plot. The signal powers are superimposed in the same column as an indication of the signal fluctuations at the output ports. In the case of an ideal matching, the other three signals disappear in this plot because the last signal covers them.

When the number of ports is small (4 × 4), fairness is achieved as long as the IL between ports is balanced. When the number of ports is small, an IL fairness is defined as the number of bar state differences between the longest and shortest paths. Fairness is achieved if the IL fairness is adjusted to a minimum value. Thus, the optimal solution balances two conflicting objectives: IL and XTL. The 14 dB ER value observed at the output is competitive with most existing research results and sufficient for most applications.

We are considering Equations (3) and (4) to be nontrivial; solving this optimization problem becomes exceptionally complicated, especially for large values of N. Similarly, we can build the state matrix shown on the left in Figure 5a. Adding some constraints (5) and (6) to the state matrix helps to speed up the process. Figure 5d reveals the contribution of the near-optimal solutions for this problem. With the increase in N, the greedy solution gradually dominates. This work is very low in time complexity. A comparison with other state-of-the-art algorithms can be found in Table 2.

3.3. Routing Table for Four-Port Benes

The advantage of the ETS algorithm is that it builds the routing table with the least PDR. Figure 6 displays a routing table, which ensures it covers all 256 paths of 64 matchings. It makes the average power penalty as low as possible.

We first study the impact on the operating wavelengths and compare the power penalty range with the brute force. The curve in Figure 6a has different slopes with seven operating wavelengths. However, when the wavelength is 1578.04 nm, we obtain the most extensive fluctuation range of the optical power between 7.5 dB and 3.5 dB via the looping algorithm. Reducing the routing table generally leads to a small range of fluctuation.

Next, we study the routing table and compare the results with Chen’s routing table. Generally, the proposed algorithm has similar but slightly lower results than Chen’s method. The ETS algorithm has the lowest power penalty levels among all wavelength channels. Different choices of routing tables lead to differences in power penalty. Chen’s solution uses RMSE for the power penalty, which would track all paths to the optimal selection of the paths. On the contrary, the ETS algorithm is more straightforward than Chen’s solution by selectively applying path constraints. Nevertheless, the shapes of the two (ETS algorithm and Chen’s method) do not change significantly.

3.4. Ratio of the Optimal to Near-Optimal Algorithm

Generally, the routing table is jointly decided by the optimal and near-optimal algorithms, the ratio of which is evaluated here. As mentioned, the Benes network consists of

2 N l o g_{2} N - N

SEs. Connections of all input and output ports give

N l o g_{2} N - N / 2

random matchings, but only

N!

are unique permutations. When two or more matchings x were found, they provided different patterns in the same permutation, where the input and output ports are connected identically. Figure 7 shows the performance of the matchings fit a Poisson model in Equations (11) and (12), where

λ (.)

and x denote the x-th arbitrary matching.

\begin{matrix} f (x | λ) = λ^{x} / x! e^{(- λ)}; x = 0, 1, 2, \dots, N! \end{matrix}

(11)

\begin{matrix} λ = l o g_{2} N - 1 \end{matrix}

(12)

In Figure 7, the optimal ORA solution decreased from 16% to 5% as the port count (N) increased from 4 to 8. As the central peak moved to the right, the near-optimal GRA solutions to the permutation problem became larger. Ultimately, the greedy model gradually dominates in computing the routing tables.

3.5. Routing Table for Eight-Port Benes

The working of the GRA algorithm was illustrated using the example of the eight-port Benes network. It routed 8! permutations by setting up 20 switches in the network. When

2^{20}

matchings are uploaded into the register, identical solutions are collected in Figure 8. We can rule out a sequence of matchings to satisfy both IL and XTL requirements by minimizing the switches in bar states.

Consider the eight-port Benes as an example. Two typical greedy methods are commonly used to compute the routing table. To guarantee a low IL and XTL level, the priority is to minimize the number of switches in the bar states, as discussed in [22]. Next, the second priority is set so that switches in bar states are encouraged to be placed in the first half of the network rather than those in cross states. Generating routing tables by this practice has the advantage of less fluctuation than the conventional looping algorithm.

Figure 9a,b shows their cumulative density function (CDF) plot on wavelength dependence. We demonstrate the benefit of using the GRA to adjust the routing tables for parallel paths. The different CDFs follow a staircase pattern, representing the typical channel dynamics from all input ports to the output ports. As these figures show, the performances of the bar-state and cross-state priorities are almost the same. This indicates that the selection of the fixed routing table limits the fluctuation to a small range.

Moreover, an operating wavelength of 1578.04 nm was observed to obtain the best responsivity within the smallest fluctuation range. Specifically, the optimal IL changes from −13 dB to −11 dB as the XTL level increases from −30 dB to −20 dB. On the other hand, most operating wavelengths, including the wavelength of 1549.99 nm, yield the worst responsivity with relatively unsatisfactory results.

3.6. Routing Table for 32-Port Benes

Figure 10a,b shows the routing tables of the 32 × 32 switch network for the two extreme cases. A comparison is made between the same type of design with and without the proposed algorithm, and it runs at 50 Gbps. The transmission performance can be further enhanced over that reported in our previous work [3,4].

Traditional loop algorithms show a very high degree of symmetry of the paths, such that the worst-case straight line with the largest IL occurs exactly twice, i.e., between the input and output ports of 1 and 32. The problem with the traditional approach is that it passes through the worst case twice. Solving this problem is to apply the ETS algorithm to replace the conventional counterpart. The main idea of our ETS algorithm is to solve the fairness problem with the longest path. Specifically, the wavelength between input and output ports 2 and 31 is the longest path for the conventional method. Accordingly, the straight line with the largest IL occurs only once, i.e., between input port 1 and output port 1. In addition, the longest path also occurs only once, i.e., from input port 32 to output port 32.

Next, we explain why the ETS algorithm needs to have one and only one first-order XTL at the receiver. We would like to reduce the first-order XTL to at least one to balance the width and height of the feasible region. The more first-order XTL there is, the greater the impact of signal filtering. Therefore, the greater the filtering effect, the smaller the feasible region becomes. On the contrary, if the first-order XTL is missing, the ER is significantly reduced, and the accompanying critical parameters, such as XTL and ER, are often difficult to measure accurately. This is because the valley region of the first-order XTL can be used to adjust the operating wavelength and resonance.

The system achieves total optical spectra with wavelengths ranging from 1510 nm to 1620 nm to cope with all output terminals. One can observe that the resonances were aligned with the same wavelength, say 1549.99 nm.

Figure 11 shows how the width of the feasible region was affected by the choice of the bar states at a wavelength of 1549.99 nm. The most significant number of bar states provided the widest BW, 1620−1510 = 110 nm, in the traditional all-bar method. In contrast, the smallest number of bar states resulted in the narrowest BW, 1.7 nm, in the traditional all-cross method. The optimal ORA and near-optimal GRA methods can significantly improve the BW over the conventional counterparts. The 1.7 nm/0.8 nm × 100 GHz = 212.5 GHz BW was extended to 300 GHz and 375 GHz, respectively.

As discussed in Section 2, the ETS algorithm has the potential to improve multipath fairness, which depends on the topology and routing table. These losses are mainly caused by the number of switches in bar states and imperfect waveguide crossings. When the ERs are comparable, the proposed method produces a distinct feasible region with approximately only one interfering signal, as shown in Figure 11b. The reason for this feature is that a greedy OGA algorithm is used and the number of bar-state switches is prioritized to the maximum. For the input 4 and output 4 optical paths, the ETS method achieves an optimized ER = 24 dB with a BW of 3 nm/0.8 nm × 100 GHz = 375 GHz.

In both extreme cases, the ETS shows shifts in the XTL and IL curves from left to right; see Figure 11a,b. When the ORA algorithm uses cross-state priority instead of bar-state priority, the shifts are more significant. When the GRA method is used, the change is more minor. Since the trends of IL and XTL are not consistent, then the ER is almost equal for both cases. Here, the ER of the GRA method is higher than that of the conventional method. In comparison, the ER of the ORA method is lower than that of the traditional method. This may be due to the fact that the ORA routing method is superior to the GRA method.

Finally, we conducted an investigation into the impact of ER indicators on the switch fabric under varying cross-state conditions. Refer to Figure 11c,d for a visual representation. The optimal solution, incorporating multiple switches in the cross-state configuration, effectively reduced the dynamic range requirement to 13.66 dB, yielding a notable 2 dB improvement upon measurement. Alternatively, the near-optimal solution, where all switches were set to cross-state, constrained the dynamic range requirement to 4.27 dB, showcasing a substantial gain of 5.1 dB.

Within this wavelength range, the spectral shape of all symmetrical solutions, as depicted in Figure 12a,c,d, exhibited a consistent changing pattern. Generally, a symmetrical solution with many bar states was considered to introduce considerable variations in the IL and XTL. To address this issue, we implemented the optimal asymmetrical solution, illustrated in Figure 12b. Although this solution resulted in different shapes across all output ports due to its asymmetry, the associated fluctuations were minimal, mitigating the adverse effects on the measured ER. A comparison with other state-of-the-art algorithms on physical metrics can be found in Table 3.

4. Conclusions

This paper validates an intelligent routing algorithm for path selection in Benes networks through simulation and an analytical model. The proposed method combines the advantages of the optimal and GRAs to improve the fairness of the paths. Using the proposed scheme, the dynamic range requirements at the receiver side of small-port (8 × 8 and 4 × 4) Benes networks can be reduced by 2 dB and 5.1 dB, respectively. The feasible region of the large-port (32 × 32) Benes network becomes more considerable. The area height metric ERs are improved by 1 dB and 6 dB, respectively, and the area width metric BW is improved by 162.5 GHz and 87.5 GHz at both extremes, respectively. It can also be seen from the numerical examples that when the ERs are comparable, the proposed greedy method produces a clean, feasible region with only one first-order XTL.

Author Contributions

Conceptualization and writing—original draft, formal analysis, and resources, L.Z.; review, editing, and validation, Z.L. and T.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of China grant number 62201338 and Overseas Young Talent Introduction of Anhui Province (2023).

Data Availability Statement

The data presented in this study are available in this article.

Conflicts of Interest

Author L.Z. was employed by Anhui Zhiguo Intelligent Technology Company. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

BW	Bandwidth
CDF	Cumulative Distribution Function
ER	Extinction Ratio
ETS	Efficient Two-Step Algorithm
GRA	Greedy Algorithm
IL	Insertion loss
ORA	Optimal Routing Algorithm
PDR	Power Dynamic Range
XTL	Crosstalk

References

Pamidighantam, V.R.; Yeluripati, R.K. Intra-board Free-space Optical Interconnects for Data Centers. In Frontiers in Optics; Optica Publishing Group: Washington, DC, USA, 2022; p. JW5A-97. [Google Scholar]
Shabka, Z.; Zervas, G. Network-aware Compute and Memory Allocation in Optically Composable Data Centers with Deep Reinforcement Learning and Graph Neural Networks. J. Opt. Commun. Netw. 2023, 15, 133–143. [Google Scholar] [CrossRef]
Zhao, L.; Shi, P.; Zhang, H. Bi-directional Benes with Large Port-Counts and Low Waveguide Crossings for Optical Network-On-Chip. IEEE Access 2021, 9, 115788–115800. [Google Scholar] [CrossRef]
Zhao, L.; Shi, P. A Universal Method for Constructing N-Port Reconfigurable Non-Blocking Optical Switches on a Silicon Chip. IEEE Access 2021, 10, 1850–1859. [Google Scholar] [CrossRef]
Zhao, L. Power-Aware Path Selection Routing Algorithm for Benes Integrated Photonic Switches. In Eighth Symposium on Novel Photoelectronic Detection Technology and Applications; SPIE: Bellingham, WA, USA, 2022; Volume 12169, pp. 1102–1107. [Google Scholar]
Weninger, D.; Serna, S.; Jain, A.; Kimerling, L.; Agarwal, A. High Density Vertical Optical Interconnects for Passive Assembly. Opt. Express 2023, 31, 2816–2832. [Google Scholar] [CrossRef]
Ji, H.; Wang, Z.; Li, X.; Li, J.; Unnithan, R.R.; Su, Y.; Hu, W.; Shieh, W. Photonic Integrated Self-Coherent Homodyne Receiver without Optical Polarization Control for Polarization-Multiplexing Short-Reach Optical Interconnects. J. Light. Technol. 2022, 41, 911–918. [Google Scholar] [CrossRef]
Fang, X.; Yang, F.; Chen, X.; Li, Y.; Zhang, F. Ultrahigh-Speed Optical Interconnects with Thin Film Lithium Niobate Modulator. J. Light. Technol. 2022, 41, 1207–1215. [Google Scholar] [CrossRef]
Vijayaraj, A.; Anand, M.V.; Mageshkumar, N.; Deepan, S.; Karuppiah, S. Allocating Resources in Load Balancing using Elastic Routing Table. Ann. Rom. Soc. Cell Biol. 2021, 25, 13051–13063. [Google Scholar]
Tamaz, L.; Nino, Z. Compile a Routing Table using a Static Routing Algorithm. Peerian J. 2021, 1, 27–35. [Google Scholar]
Tran, H.; Nguyen, S.; Yen, I.; Bastani, F. IoT Data Discovery: Routing Table and Summarization Techniques. arXiv 2022, arXiv:2203.10791. [Google Scholar]
Hemalatha, M.; Rukmanidevi, S.; Shanker, N. Searching Time Operation Reduced IPV6 Matching Through Dynamic DNA Routing Table for Less Memory and Fast IP Processing. Soft Comput. 2021, 25, 3455–3468. [Google Scholar] [CrossRef]
Zhao, Y.; Tian, B.; Hu, N.; Zhao, Q.; Niu, Y.; Lin, L.; Yang, Y. SQRT: A Secure Querying Scheme of Routing Table Based on Oblivious Transfer. Symmetry 2022, 14, 1245. [Google Scholar] [CrossRef]
Lee, B.G.; Dupuis, N.; Pepeljugoski, P.; Schares, L.; Budd, R.; Bickford, J.R.; Schow, C.L. Silicon Photonic Switch Fabrics in Computer Communications Systems. J. Light. Technol. 2015, 33, 768–777. [Google Scholar] [CrossRef]
Cheng, Q.; Bahadori, M.; Huang, Y.; Rumley, S.; Bergman, K. Smart Routing Tables for Integrated Photonic Switch Fabrics. In Proceedings of the 2017 European Conference on Optical Communication (ECOC), Gothenburg, Sweden, 17–21 September 2017; pp. 1–3. [Google Scholar]
Ding, M.; Wonfor, A.; Cheng, Q.; Penty, R.V.; White, I.H. Hybrid MZI-SOA InGaAs/InP Photonic Integrated Switches. IEEE J. Sel. Top. Quantum Electron. 2017, 24, 1–8. [Google Scholar] [CrossRef]
Cheng, Q.; Huang, Y.; Yang, H.; Bahadori, M.; Abrams, N.; Meng, X.; Glick, M.; Liu, Y.; Hochberg, M.; Bergman, K. Silicon Photonic Switch Topologies and Routing Strategies For Disaggregated Data Centers. IEEE J. Sel. Top. Quantum Electron. 2019, 26, 1–10. [Google Scholar] [CrossRef]
Huang, Y.; Cheng, Q.; Hung, Y.H.; Guan, H.; Meng, X.; Novack, A.; Streshinsky, M.; Hochberg, M.; Bergman, K. Multi-Stage 8 × 8 Silicon Photonic Switch Based on Dual-Microring Switching Elements. J. Light. Technol. 2020, 38, 194–201. [Google Scholar] [CrossRef]
Lin, B.C. Fault-Tolerant General Beneš Networks. IEEE Trans. Commun. 2023, 71, 6928–6938. [Google Scholar] [CrossRef]
Seznec. A New Interconnection Network for SIMD Computers: The Sigma Network. IEEE Trans. Comput. 1987, 100, 794–801. [Google Scholar]
Opferman, D.C.; Tsao-Wu, N.T. On a Class of Rearrangeable Switching Networks Part I: Control Algorithm. Bell Syst. Tech. J. 1971, 50, 1579–1600. [Google Scholar] [CrossRef]
Qiao, L.; Tang, W.; Chu, T. 32× 32 Silicon Electro-Optic Switch with Built-In Monitors and Balanced-Status Units. Sci. Rep. 2017, 7, 42306. [Google Scholar] [CrossRef]
Çam, H.; Fortes, J.A.B. Work-efficient routing algorithms for rearrangeable symmetrical networks. IEEE Trans. Parallel Distrib. Syst. 1999, 10, 733–741. [Google Scholar] [CrossRef]
Poon, A.W.; Luo, X.; Xu, F.; Chen, H. Cascaded Microresonator-based Matrix Switch for Silicon On-Chip Optical Interconnection. Proc. IEEE 2009, 97, 1216–1238. [Google Scholar] [CrossRef]
Gastwirth, J.L. A General Definition of the Lorenz Curve. Econ. J. Econ. Soc. 1971, 39, 1037–1039. [Google Scholar] [CrossRef]
Boeck, R.; Jaeger, N.A.F.; Rouger, N.; Chrostowski, L. Series-coupled Silicon Racetrack Resonators and The Vernier Effect: Theory and Measurement. Opt. Express 2010, 18, 25151–25157. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Illustration of the problem statement: (A) A small feasible region. (B) A large feasible region.

Figure 2. The principle of the ETS algorithm: Stage 1: Use traditional bipartite graphs to create auxiliary graphs. Stage 2: Choose the suitable matching according to the optimal algorithm principle. Otherwise, use the near-optimal algorithm. Stage 3: Tuning the states of the switches according to the routing table and the operating wavelength.

Figure 3. (a) The ring resonators in the “ON” states, (b) The ring resonators in the “OFF” states. (c) The transmission spectra.

Figure 4. (a) The normalized transmission spectra of two-port switches in the “bar” states or (b) in the “cross” states using the simulation and the analytical method.

Figure 5. Four variations of the same permutation: (a) The optimal algorithm. (b) Cheng’s method. (c) The looping algorithm. (d) The GRA. 1, 2, 3 refer to the 1st, 2nd, and 3rd stages of the Benes network, respectively.

Figure 6. Improvement of the cumulative contribution of the path power penalty due to the proposed ETS method compared with (a) the looping algorithm and (b) Cheng’s algorithm.

Figure 7. Percentage versus the matching number using the optimal and near-optimal algorithms.

Figure 8. The ETS method decides which is the best path in the routing table using the (a) looping algorithm, (b) the cross-state priority, and using (c) GRA method.

Figure 9. CDF improved in both (a) IL and (b) XTL due to the GRA with bar-state and cross-state priority.

Figure 10. Four variations of two extreme permutations at the 32-port level: (a) traditional all-bars solution, (b) all-bars solution after using the optimal solution, (c) all-cross solution after using the greedy solution, and (d) traditional all-cross solution.

Figure 11. Transmission spectra of the 32 × 32 switch: (a) traditional all-bars solution, (b) all-bars solution after using the optimal solution, (c) all-cross solution after using the greedy solution, and (d) traditional all-cross solution. The operating wavelength is 1549.99 nm.

Figure 12. Improvements in (a) IL and XTL of all-bar schemes before and after the optimal algorithm, (b) IL and XTL of all-cross schemes before and after using the greedy scheme, (c) ER of all bar schemes before and after the optimal algorithm, and (d) ER of all-cross schemes before and after using the greedy scheme.

Table 1. Related parameters.

Parameters	Values
Coupling efficient	$0.5 / 0.5$
Group refractive/Effective refractive index	2.6/2.6
Free spectrum range (FSR)	26 nm
Maximum attenuation	3 dB/cm
The perimeter of the ring (On state)	4/12 µm
The perimeter of the ring (Off state)	3.48/10.45 µm

The parameter setting is almost the same as in [4,26], except that the coupling efficiency and perimeter of rings are slightly different.

Table 2. Comparison with state-of-the-art algorithms.

References	Solutions	Power Imbalances	Physical Indicators	Time Complexity
Looping Algorithm [21]	Always	High	Not given	O( $n N$ )
Work Efficient Algorithm [23]	Sometimes	Low	Not given	O( $n N / p$ )
Lee’s Algorithm [17]	Always	Low	MLSE	O( $n N^{2}$ )
This Work	Always	Low	Without MLSE	min(O( $n N / p$ , O( $n N$ ))

N denotes the port count, whereas n denotes

l o g_{2} N

, and p is an integer [1, n].

Table 3. Comparison with state-of-the-art on physical metrics.

Key Indicators	[14]	Our Previous Work [3,4]	This Work
Port number	64	32	32
Data rate	25	25	25/50
Link power budget	−15 dBm	−10.9 dBm	−10.9 dBm
Power imbalance	1.5 dB to 95 dB	1.5 dB to 45 dB	5 dB to 36 dB
Eye width	Almost closed	1.7 nm−110 nm	2.4 nm−60 nm
Number of the first-order XTLs	n	n	1 or p
BER	$10^{- 12}$	$10^{- 9}$	$10^{- 9}$
WDM	16	8	8

Like before, n denotes

l o g_{2} N

, and p is an integer [1, n].

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, L.; Li, Z.; Ma, T. Making Path Selection Bright: A Routing Algorithm for On-Chip Benes Networks. Electronics 2024, 13, 981. https://doi.org/10.3390/electronics13050981

AMA Style

Zhao L, Li Z, Ma T. Making Path Selection Bright: A Routing Algorithm for On-Chip Benes Networks. Electronics. 2024; 13(5):981. https://doi.org/10.3390/electronics13050981

Chicago/Turabian Style

Zhao, Li, Zhiwei Li, and Tianming Ma. 2024. "Making Path Selection Bright: A Routing Algorithm for On-Chip Benes Networks" Electronics 13, no. 5: 981. https://doi.org/10.3390/electronics13050981

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Making Path Selection Bright: A Routing Algorithm for On-Chip Benes Networks

Abstract

1. Introduction

2. Network Topology

2.1. Problem Statement

2.2. Mathematical Model

2.2.1. Model Parameters

2.2.2. Auxiliary Graphs

2.2.3. MODEL and Constraints

2.3. Algorithm and Constraints

3. Numerical Results

3.1. Two-Port Switching Element

3.2. ORA Scheme

3.3. Routing Table for Four-Port Benes

3.4. Ratio of the Optimal to Near-Optimal Algorithm

3.5. Routing Table for Eight-Port Benes

3.6. Routing Table for 32-Port Benes

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI