*3.1. ECMP-Dijkstra Algorithm*

When deploying the best intercept access point in hybrid SDN, we have to calculate all equivalent shortest paths between two points and then select out the best route from all equal-cost shortest paths to choose the best node as IAP. The most typical single source shortest path algorithm is Dijkstra Algorithm [6]. Accordingly, an improved algorithm of Dijkstra Algorithm was proposed by Li [7]. Under the concept of precursor node, Li exploited the initial shortest path set calculated by Dijkstra Algorithm, to calculate most but not all of the shortest paths. In view of this, on the basis of Dijkstra Algorithm and Li's Algorithm, we propose an improved equal-cost multi-path shortest paths algorithm (ECMP-Dijkstra), which can calculate all equal-cost shortest paths from one node to all nodes. The notations used in the algorithms and in the following equations are listed in Table 2.



The pseudo code of Dijkstra Algorithm is given in Algorithm 1. We input the source node *s* and an undirected graph *G (V,E)* where *V* denotes the set of all nodes and *E* denotes the set of all edges. We explain Algorithm 1 that inf denotes an infinity and *sps,i* denotes the shortest path from the source node *s* to node *i*. In lines 11–14, we ge<sup>t</sup> the minimum hop-count value minhops and the corresponding node key. In lines 15–19, we remove node key from U and then add node key to S and add node key to the shortest path sps,i to ge<sup>t</sup> the shortest path sps,key. Finally, we obtain the shortest path set SP from the source node s to all nodes in V.

#### **Algorithm 1** Dijkstra Algorithm

**Input:** *s*; *G(V,E)* **Output:** *SP* 1: *S(s)* = 0; *U(i) = inf, i* ∈ V, i = s; *SP =* ∅ 2: *SP* ← *SP* ∪ *sps,s* 3: **while** *U* = ∅ do 4: *tsp =* ∅*; minhops= inf; key = None* 5: **for** edge *ei,j* in *E* **do** // node *i, j* ∈ V, i = j 6: **if** *hops(ei,j)* + *S(j)* ≤ *U(i)* **then** 7: *U(i)* ←*hops(ei,j) + S(j)* 8: *tsp(i) = j* 9: **end if** 10: **end for** 11: *numu(k), inu(k)* ←*sort(U(i))* 12: *β* ← *Num(numu(k))* 13: *minhops= numu(β)* 14: *U* ← *U* – *key* 15: *S(key) = minhops* 16: *S(key) = minhops* 17: **for** shortest path *sp0s,i* in *SP* **do** 18: **if** *i == tsp(key)* **then** 19: *sps,key* ← *Merge(sps,i,key)* 20: *SP* ← *SP* ∪ *sps,key* 21: **end if** 22: **end for** 23: **end while** 24: **return** *SP*

Based on Dijkstra Algorithm, we propose an improved equal-cost multi-path shortest paths algorithm (ECMP-Dijkstra) so as to calculate all equal-cost shortest paths from the source node *s* to all nodes. Detailed pseudo code of ECMP-Dijkstra Algorithm is summarized in Algorithm 2. At beginning, we input the source node *s* and the shortest path set *SP* calculated by Dijkstra Algorithm, which contains only one shortest path from *s* to all nodes. In line 1, we use the shortest path set *SP* to calculate the minimum hop-count or cost set *S* from *s* to all nodes by the function *hops()* and *S(i)* denotes the minimum cost from node *s* to node *i*. In line 5, *rsp(i)* denotes all equal-cost shortest paths from the source node *s* to the destination node *i*. We loop through the edge-set *E(ei,j)* and judge whether the hop-count or cost from node *s* to node *i* (i.e., *S(i)*) plus the hop-count of *edgei,j* equals the hop-count from node *s* to node *j* (i.e., *S(j)*). If it does, then we add node *j* to all equal-cost shortest paths from node *s* to node *i* in lines 11–12, thus obtaining multiple shortest paths from node *s* to node *j* and adding them to the shortest path set *SP* in line 13. In lines 2–13, we exploit the precursor node and the initial shortest path set *SP* repeatedly, to add equal-cost shortest paths to *SP* and thus update *SP* constantly. In line 18, we delete the duplicate shortest path from *SP* using the function *DeleteDup()*. Thus, we update the shortest path set *SP* repeatedly until the number of shortest paths in SP does not increases.

**Algorithm 2** ECMP-Dijkstra Algorithm


We use three real-world topologies CRN, COST 239, NSFNet for simulation experiments, where China's 156 major railway nodes network (China Railway Network; CRN) [34] has 156 nodes and 226 links, Pan-European fiber-optic network (COST 239) [35] has 28 nodes and 41 links and T1 NSFNet network topology [36] has 14 nodes and 21 links. Under the three topologies, we compared ECMP-Dijkstra Algorithm with Dijkstra Algorithm and Li's Algorithm and the experimental results are shown in Figure 1 where TSP denotes the total number of shortest paths from one node to all nodes. Moreover, the higher the TSP, the better the intercept access points deployment may be. From the figures, we know that TSP of ECMP-Dijkstra Algorithm is higher than Dijkstra Algorithm and Li's Algorithm, thus, to deploy intercept access point reasonably.

**Figure 1.** The impact of three shortest path algorithm on TSP in three topologies. TSP. The higher, the better.

#### *3.2. SDN Interception Models*

For lawful interception in hybrid SDN, we first need to analyze how to intercept, that is, how to deploy intercept access point between the source (S), the destination (D) and the interception center (L). In this section, we will analyze various network interception models (i.e., the deployment strategies of IAP) in hybrid SDN. The deployment of intercept access point includes the single "IAP selection problem" in the shortest path S-D (i.e., the shortest path from S to D) and its derived "the shortest path algorithm problem between three points (i.e., S, D and L)". The above two problems can be viewed as the same problem. Once the location of the intercept access point is determined, then the fourth point (IAP; I) can meet the service traffic between S, D and L. Under the condition that S-I, D-I, and L-I path are the shortest at the same time, the shortest path between three points can be solved to meet the needs of interception system.

We aim to solve the problem of selecting single intercept access point and routing between three-points, namely to deploy the best intercept access point in the shortest paths between S, D, and L. Analyzing interception models in hybrid SDN, we divide them into two interception models by the deployment location of intercept access point: legacy interception models and SDN interception models as shown in Figures 2 and 3.

**Figure 3.** SDN interception models.

The legacy interception models include: S/I model, D/I model, and L/I model as shown in Figure 2a–c. As we all know, the interception service in legacy networks is limited by the deployment location of intercept access point due to the unimaginable cost of setting up special equipment and dedicated return link, to intercept network traffic. Thus, S or D or L is usually adopted as intercept access point I used to respond to the requirements of the interception center and to perform the traffic interception action in legacy network.

In this paper, we mainly study and analyze the SDN interception models, which includes T model, ECMP-T model and Fermat-point model as shown in Figure 3a–c. In view of the performance metrics of lawful interception system, the three SDN interception models are used to thoroughly study to find the optimal algorithm of deploying intercept access point.

Figure 3a shows T model: its name comes from the topology similar to the T-word. Under the concept of SDN networking in an undirected and weighted network *G(V,E)*, any SDN node on the shortest path S-D can be selected as intercept access point (I) under the premise of not affecting the existing shortest path arrangemen<sup>t</sup> of S-D (i.e., maintaining the existing end-to-end transmission quality). While only the node with the minimum hop-count (or cost) to the interception center (L) should be adopted as the best I-point to run the function to capture traffic transferred to the interception center.

Figure 3b shows ECMP-T model: based on the operation mode of T model, the path I-L must be the shortest path, but this shortest path S-I-D does not necessarily meet the optimal path. In fact, there may be more than one shortest path S-D, namely, the shortest path S-D is equal-cost multi-path (ECMP). Hence there may be a southward equal-cost shortest path in the T-word path theoretically, in which there is another intercept access point (I) and the hop-count (or cost) of I-L path is lower than the current one, so this interception model is called ECMP-T model that the nearest I-point from the interception center (L) is selected as the best intercept access point I among all the equivalent shortest paths between S and D.

Detailed pseudo code of T or ECMP-T model is presented in Algorithm 3. At the beginning of the algorithm, the set I used to store the best intercept access point is set to be empty in line 1. In lines 2–4, we calculate the shortest path *spS,D* from *S* to *D* using Dijkstra or ECMP-Dijkstra Algorithm and then obtain the node-set NS,D in the shortest path *spS,D*, and next select out SDN nodes from the node-set *NS,D* to ge<sup>t</sup> the SDN node-set *SN*. If the SDN node-set *SN* is not empty, we traverse SDN nodes in *SN* and implement lines 6–16; otherwise, we fail to deploy intercept access point (IAP) between *S*, *D* and *L*, and thus save the wrong node combination of *S*, *D* and *L* in line 18. Line 7 calculates the lowest hop count (or cost) from SDN nodes to L and ge<sup>t</sup> the cost vector *h(i).* In line 9, we sort the cost vector *h(i)* by size of hop count in descending order and then ge<sup>t</sup> the sorted vector *numh(j)* and the corresponding label vector *inh(j)*, where j denotes the subscript of j-th element of a vector. Line 11 takes the minimum hop-count value *minhops* from the sorted vector *numh(j)*. Finally, in lines 13–14, we select the node with the minimum cost *minhops* as the best intercept access point and then add the selected IAP *inh(j)* to the set I.

**Algorithm 3** T or ECMP-T Model


```
9: numh(j), inh(j)
                      ← sort(h(i))
10: β ← Num(SN)
11: minhops
               ← numh(β)
12: for key j in numh do
13: if numh(j) = minhops then
14: I ← I ∪ inh(j)
15: end if
16: end for
17: else
18: SaveFail(S,D,L)
19: end if
20: return I
```
The only difference of pseudo code of T model and ECMP-T model is whether to use Dijkstra Algorithm or ECMP-Dijkstra Algorithm to calculate the shortest path.

Figure 3c shows Fermat-point model: In geometry, Fermat-point refers to the point with the smallest sum of the distances from the three vertices of the triangle. Accordingly, we extend it to the node with the smallest sum of the distances from the three nodes of S, D and L in SDN network, and at the same time with meeting the constraints of the shortest path of S-D, S-L and D-L between the three points. Theoretically, Fermat-point model is optimal.

Details of pseudo code of Fermat-point model are summarized in Algorithm 4. In lines 2–4, we calculate all equal-cost shortest paths of S-D, S-L, D-L using ECMP-Dijkstra Algorithm, and then obtain all node sets in the equal-cost shortest paths in lines 5–7, and next combine these node sets to ge<sup>t</sup> the node-set *NS,D,L* in line 8, and further select out SDN nodes from the node-set *NS,D,L* to ge<sup>t</sup> the SDN node-set *SN*. If the SDN node-set *SN* is not empty, we traverse SDN nodes in *SN* and implement lines 11–24; otherwise, we fail to deploy intercept access point (IAP) between S, D and L, thus to save the wrong node combination of S, D and L. Lines 12–14 calculate the lowest hop count (or cost) of i-S, i-D, i-L, and then add the results to the sum, to ge<sup>t</sup> the cost vector *h(i)* in line 15. In lines 17–24, we sort the cost vector *h(i)* by size of cost value in descending order and then take the minimum cost value *minhops*, and next select the node *inh(j)* with the minimum cost *minhops* as the best intercept access point and finally add the selected IAP *inh(j)* to the set *I*.

**Algorithm 4** Fermat-point Model

18: *I* ← *I* ∪ *inh(j)*



We use *spi-j* to denote the shortest path from node *i* to node j, and *hopsi-j* denotes the lowest hop-count or cost from node *i* to node *j*. We use '→' to denote that the next-node is N-SDN node and use '⇒' to denote that the next-node is SDN node. Examples of three interception models are illustrated in Figure 4, where we select node 154, node 9, node 105 all marked by red as S, D and L respectively and select 30 nodes randomly in Figure 4 as SDN nodes which includes node i∈{4, 8, 11, 19, 23, 25, 31, 38, 49, 50, 58, 60, 65, 67, 77, 82, 89, 92, 100, 103, 117, 120, 121, 125, 128, 134, 140, 150, 152, 156}, to construct a hybrid SDN.

**Figure 4.** China's 156 major railway nodes network (China Railway Network; CRN).

We run T interception model: One shortest path from node 154 to node 9 is *sp154-9* marked by pink in Figure 4 that is 154 → 153 ⇒ 152 → 146 → 142 → 136 ⇒ 134 → 124 ⇒ 121 ⇒ 117 → 94 → 81 ⇒ 82 → 74 → 52 ⇒ 49 → 32 → 30 ⇒ 31 ⇒ 25→9, and *hops154-9* = 20. Among all nodes in *sp154-9*, node 117 that is an SDN node has the lowest hop count to node 105 due to *hops117-105* = 6, and thus node 117 can be used as the best intercept access point I in T interception mode.

We run ECMP-T interception model: There are 22 equivalent shortest paths from node 154 to node 9, but we only show three shortest paths (i.e., *sp154-9* contains *sp1154-9*, *sp2154-9*, and *sp3154-9*) from node 154 to node 9 marked by pink, bright green, turquoise respectively in Figure 4. *sp1154-9* is 154 → 153 ⇒ 152 → 146 → 142 → 136 ⇒ 134 → 124⇒ 121 ⇒ 117 → 94 → 81 ⇒ 82 → 74 → 52 ⇒ 49 → 32 → 30 ⇒ 31 ⇒ 25 → 9, and *sp2154-9* is 154 → 153 → 155 → 144 ⇒ 140 → 133 ⇒ 134 → 124 ⇒ 121 ⇒ 117 → 99 → 97 → 69 → 68 → 61 ⇒ 60 → 56 ⇒ 23 → 24 ⇒ 25→9, and *sp3154-9* is 154 → 153 → 155 → 144 ⇒ 140 → 133 ⇒ 134 → 115 → 113 → 112 ⇒ 100 → 101 → 64 → 63 → 62 → 59 → 17 → 16 ⇒ 19 → 10 → 9, and *hops154-9* = 20. Among all nodes in *sp154-9*, node 100 that is SDN node in *sp3154-9* has the lowest hop count to node 105 due to *hops100-105* = 4, and thus node 100 can be used as the best intercept access point (I) in ECMP-T interception mode. Apparently, *hops100-105* < *hops117-105*, namely, this I-point outperforms the one in the T model.

We run Fermat-point interception model: the node-set *N154-9-105* with no repeat is obtained by all sp*154-9*, sp*154-105,*sp*9-105* (i.e., spS-D, spS-L, spD-L). Namely, *N154-9-105* contains all nodes of all the shortest paths from node 154 to node 9, from node 154 to node 105, and from node 9 to node 105. And then, the sum of hop-count from node 103 in *N154-9-105* to node 154, node 9, node 105 (i.e., *hops103-154,9,105*) is the smallest and *hops103-154,9,105* = 23, This means that node 103 that is SDN node in *N154-9-105* can be used as the best intercept access point I in Fermat-point interception mode.

We have solved the problem of single intercept access point deployment above, and then expand to deploy intercept access points in hybrid SDN.

Running different network interception models, we will study and analyze the influence on the best transmission quality of intercepted data (the minimum cost from intercept access point (I) to interception center (L); MILC), the total cost of running intercept operation in global network (TOC), and the quality of service of normal user's data stream (UQoS) with different proportion of SDN node. According to the proposed three models in Figure 3, MILC, TOC, UQoS are calculated in respectively in (1), (2) and (3), where *N* denotes the maximum node label or index, and any node can be selected as S, D and L in hybrid SDN topology, i.e., there are *N3* possibilities for node-combination of S, D and L. After the node-combination selection (S, D, L), the best intercept access point (I) can be go<sup>t</sup> by the SDN interception models, then the hop count or cost of the shortest path S-I, D-I and L-I can be calculated by the function *hops(i,j)*, thus calculating MILC, TOC and UQoS.

#### **4. Restricted Minimum Vertex Cover Algorithm**

There is no exception that most network optimization deployment problems can be viewed as the minimum vertex cover problem (MVCP) in graph theory. In the process of migration of SDN technology for large-scale Internet, it may be faced with the situation of hybrid deployment of SDN nodes and non-SDN nodes (N-SDN). In this hybrid SDN, not all nodes have software-defined functions to play the role of intercept access point. Only some nodes with the function of software definition can respond to the requirements of the interception center and to run interception operation. Therefore, it is very critical to select the best deployment location of intercept access point. And IPA must occupy the position of the hub, greatly covering all traffic through the critical path, and under a certain proportion of threshold, it may not achieve 100% intercept link coverage. Therefore, the minimum vertex cover problem must be transformed into the restricted minimum vertex cover problem question (RMVCP).

$$\text{MILCC} = \sum\_{S=1}^{N} \sum\_{D=1}^{N} \sum\_{L=1}^{N} ho \, ps(L, I) \tag{1}$$

$$\text{TOC} = \sum\_{S=1}^{N} \sum\_{D=1}^{N} \sum\_{L=1}^{N} hops(S, I) + hops(D, I) + hops(L, I) \tag{2}$$

$$\text{UQcS} = \sum\_{S=1}^{N} \sum\_{D=1}^{N} \sum\_{L=1}^{N} hops(S, I) + hops(I, L) \tag{3}$$

Considering overall situation (e.g., the location relevance of all intercepted targets, the operation and maintenance cost of operators) from the whole interception system, we intend to develop a restricted minimum vertex cover algorithm (RMVCA) to achieve the best intercept link coverage of the whole network with the minimum number of intercept access points as well as optimize the efficiency of deployment when deploying intercept access points in the hybrid SDN.

RMVCP: given a network graph *G(V,E)*, where *V* denotes the set of all nodes, and E denotes the set of all links in the network. There exists non-SDN nodes and SDN nodes at the same time in the network whereV=S ∪N, and *S* denotes the set of SDN nodes, and N denotes the set of non-SDN nodes. To find a *P* set ( *P* ⊆ *S* ⊆ *V*), so that every link in the network is covered (intercepted) by at least an SDN node in the *P* set.

Figure 5 shows an example of solving RMVCP. In this hybrid SDN, SDN nodes (i.e., solid circle) set *S* = {1, 3, 8, 9, 11, 12, 13, 17, 19, 20, 21, 22, 25, 26, 27, 28} and non-SDN nodes (i.e., light circle) set *N*= {2, 4, 5, 6, 7, 10, 14, 15, 16, 18, 23, 24}. Using RMVCA, the SDN nodes set *P* = {1, 8, 9, 11, 13, 20, 22, 25, 26, 27} is recommended to be selected as the intercept access points set, but 7 links (marked as dotted lines) in the example failed to be covered due to the hybrid deployment of SDN and N-SDN nodes, and thus only about 80% of the links (marked as solid lines) are completely covered by 10 intercept access points.

**Figure 5.** Hybrid SDN covered by minimum vertexes.

RMVCA ensure the result a near-optimal solution or one of the approximate solutions, so as to meet the optimal solution of the deployment problem of intercept access points.

Based on the concept of degree, we, at a time, use greedy algorithm to select one approximate or equivalent optimal intercept access point to reduce the scale of the problem recursively, so as to obtain the minimum vertex approximation set covering all SDN links and achieve the best intercept link coverage with the minimum number of intercept access point.

Details of pseudo code of RMVCA are summarized in Algorithm 5. We input an undirected and weighted network cover set *P* is set to empty originally. In line 2–3, we ge<sup>t</sup> the set *N* of N-SDN nodes and the accordingly edge-set *EN-SDN* of N-SDN nodes by the set *N*. Line 4 removes *EN-SDN* from the edge-set E, to ge<sup>t</sup> the edge set *ESDN* of SDN nodes. Lines 6–7 traverse each SDN node and calculate its degree *d(i)*. In line 9, we sort the degree vector *d(i)* in the ascending order and ge<sup>t</sup> the sorted degree vector *numd(j)* and the accordingly label vector *ind(j)* where j denotes the index or subscript of j-th element. Line 11 selects the maximum degree *numd(β)* from *numd(j)* where *β* denotes the number of SDN nodes in the set *S*. In lines 12–18, we judge the degree of node and implement accordingly measures. If the maximum degree of node is not equal to zero in line 12, we first add the node *ind(β)* where *β* denotes the subscript of β-th element to the minimum vertex cover set *P* in line 13 and then calculate the adjacent edge-set *θ(ind(β))* of node *ind(β)* in line 14, and next remove the adjacent edge-set *θ(ind(β))* from the edge-set *E* in line 15, which leads to the degree reduction of each SDN node. Finally, we return line 5 to judge whether *ESDN* is empty and then calculate the degree of each SDN node again. Otherwise, if the maximum degree of node is equal to zero, we break the loop and end the algorithm.

```
Algorithm 5 RMVCA (Restricted Minimum Vertex Cover Algo-rithm)
Input: G(V,E); S: the SDN node-set
Output:P
1: P = ∅
2: N ← V – S
3: EN-SDN ← Edge(N(i))
4: ESDN ← E –EN-SDN
5: while ESDN = ∅ do
6: for node i in S do
7: d(i) ← i
8: end for
9: numd(j), ind(j) ← sort(d(i))
10: β ← Num(S)
11: maxnumd ← numd(β)
12: if maxnumd > 0 then
13: P ← P ∪ ind(β)
14: θ(ind(β)) ←ind(β)
15: ESDN ←ESDN – θ(ind(β))
16: else
17: break
18: end if
19: end while
20: return P
```
Using RMVCA proposed above, we will study and analyze the influence of different SDN node proportion on the maximum intercept link coverage of the whole network (i.e., max-ILC) and the accordingly needed minimum number of SDN nodes in the P set for realizing the maximum intercept link coverage (i.e., numP), as well as the influence of RMVCA on the intercept link coverage (i.e., ILC) and the efficiency of deploying intercept access points in whole hybrid SDN.

#### **5. Simulation and Results**

#### *5.1. Simulation Environment and Performance Metrics of Lawful Interception System*

In our simulation, we choose three real-world backbone topologies CRN, COST 239, NSFNet to evaluate the performance of three SDN interception models. Under the three network topologies, we randomly select different number of nodes as SDN nodes to construct the hybrid SDN network and the weight of each link is set to 1 by default, and the source node (S), the destination node (D) and the interception center (L) are selected randomly and thus there are 3,796,416 (1563), 21,952, 2,744 node combinations of S, D and L.

Under different proportion of SDN nodes, we will study and analyze the influence of different SDN interception models on the best transmission quality of intercepted data (the minimum cost from intercept access point (I) to the interception center (L); MILC), the total cost of running intercept operation in global network (TOC), and the quality of service of normal user's data stream (UQoS), the deployment efficiency of IAP (the total number of times to calculate the shortest path during the process of deploying IAP; TTC)), and the total number of failures to deploy IEP (i.e., NFD).

According to the proposed three SDN interception models, MILC, MRLC, TOC, and UQoS are calculated respectively in (1), (2) and (3). Based RMVCA, we run different SDN interception models and calculate and count up MILC, TOC, UQoS, TTC and NFD of each node combination of S, D and L and then compare and analyze the results to evaluate the performance of three SDN interception models.
