**A Clustering Routing Algorithm Based on Improved Ant Colony Optimization Algorithms for Underwater Wireless Sensor Networks**

#### **Xingxing Xiao 1,2,3 and Haining Huang 1,2,3,\***


Received: 15 August 2020; Accepted: 30 September 2020; Published: 1 October 2020

**Abstract:** Because of the complicated underwater environment, the efficiency of data transmission from underwater sensor nodes to a sink node (SN) is faced with great challenges. Aiming at the problem of energy consumption in underwater wireless sensor networks (UWSNs), this paper proposes an energy-efficient clustering routing algorithm based on an improved ant colony optimization (ACO) algorithm. In clustering routing algorithms, the network is divided into many clusters, and each cluster consists of one cluster head node (CHN) and several cluster member nodes (CMNs). This paper optimizes the CHN selection based on the residual energy of nodes and the distance factor. The selected CHN gathers data sent by the CMNs and transmits them to the sink node by multiple hops. Optimal multi-hop paths from the CHNs to the SN are found by an improved ACO algorithm. This paper presents the ACO algorithm through the improvement of the heuristic information, the evaporation parameter for the pheromone update mechanism, and the ant searching scope. Simulation results indicate the high effectiveness and efficiency of the proposed algorithm in reducing the energy consumption, prolonging the network lifetime, and decreasing the packet loss ratio.

**Keywords:** underwater wireless sensor networks; ant colony optimization algorithms; clustering routing algorithms; energy efficiency; network lifetime

#### **1. Introduction**

Nowadays, underwater wireless sensor networks (UWSNs) have aroused widespread interest with the exploration and utilization of marine resources [1,2]. UWSNs are composed of numerous underwater acoustic sensor nodes deployed in underwater monitoring areas, which perform functions such as navigation, surveillance, resource exploration, intrusion detection, and data collection [3]. However, the underwater sensor nodes are small devices with limited energy and they are difficult to replace, which makes energy efficiency a major concern [4,5]. Moreover, UWSNs have disadvantages such as high propagation delay, low bandwidth, and high error rate [6]. Therefore, designing an energy-efficient routing algorithm for data transmission in a complex underwater environment is extremely important for UWSNs [7]. There exist many conventional routing algorithms in terrestrial wireless sensor networks (TWSNs), but they are usually infeasible in UWSNs [8]. The reasons are as follows. Firstly, TWSNs employ radio signals to transmit data, but UWSNs use acoustic signals for data transmission because radio signals attenuate quickly underwater [9]. Secondly, TWSNs usually employ a 2D network model, whereas UWSNs adopt a 3D network model, which is a great challenge to researchers. Thirdly, the replacement of sensor nodes is more difficult in UWSNs than in TWSNs.

In conserving energy, multi-hop data transmission in long-distance communication for UWSNs is more effective than single-hop transmission [10]. Additionally, to alleviate the problems of data collision and traffic load balance, it is important to design a reliable network topology [11]. Many studies have shown that the clustering routing algorithm is capable of saving energy, avoiding collisions, and balancing traffic load because it employs the clustering topology and uses a multi-hop mechanism during the inter-cluster data transmission [12,13]. In clustering routing algorithms, the network is divided into many clusters and each cluster consists of one cluster head node (CHN) and several cluster member nodes (CMNs) [14]. When clusters are formed, the CHNs allocate channel resources for CMNs and the CMNs transmit data according to the allocation, which can decrease collisions [15]. After receiving the data from the CMNs in the same cluster, the CHN is responsible for aggregating the data, which can reduce data redundancy and decrease the number of data packets to be sent to the sink node (SN), thereby conserving energy [16]. Meanwhile, the decreased number of data packets helps reduce collisions when the CHNs transmit them to the SN. Additionally, the multi-hop mechanism is used when CHNs send data to the SN, which can save energy compared to the single-hop mechanism. Moreover, the clustering routing algorithm usually employs a CHN rotation mechanism, which avoids the excessive energy consumption of CHNs, balances the energy dissipation, and prolongs the network lifetime [17].

Many studies indicate that clustering routing algorithms are superior in controlling data traffic and reducing data transmission, and are thus capable of saving energy, extending network lifespan, and decreasing the packet loss ratio [18–28]. The low-energy adaptive clustering hierarchy (LEACH) algorithm, the earliest clustering routing algorithm, employs a probabilistic method to select CHNs and does not consider the residual energy of nodes, which causes some low-energy nodes to become CHNs [18]. This goes against the energy balance and the energy efficiency, for these inefficient CHNs may die prematurely. Therefore, researchers proposed improved clustering routing algorithms. Domingo et al. proposed a distributed underwater clustering scheme (DUCS), which considers the residual energy of candidate nodes when selecting CHNs [19]. However, the distance between the candidate CHN and the SN is not considered in the DUCS algorithm. Xu et al. came up with a clustering routing algorithm where the CHN selection is optimized by considering the remaining energy and the positions and the density of nodes [20]. Additionally, the mechanism of data transmission from the CHN to the SN is improved, thereby minimizing the energy consumption and maximizing the network lifetime. Wang et al. presented a clustering routing protocol based on hybrid multiple hops, where the CHN selection based on the remaining energy of nodes is self-organized and the path from CHN to the destination node is obtained by the establishment of a minimum spanning tree [21]. This algorithm can reduce energy consumption and extend the network expectancy, but is designed for TWSNs instead of UWSNs. Wan et al. designed an adaptive clustering underwater network (ACUN) algorithm, which considers the residual energy of nodes and the energy loss of paths to select CHNs [22]. It also considers the node energy condition to select paths with high energy efficiency. However, the distance factor has not been considered in this literature. Bhattacharjya et al. proposed a cluster-based underwater wireless sensor network (CUWSN) algorithm, which selects CHNs based on the residual energy of nodes and adopts multi-hop transmission to forward data packets to the destination node [23]. The CUWSN can reduce energy consumption and improve the performance of the network, but the distance factor has not been taken into account and the multi-hop paths have not been optimized. In [24], Ayaz et al. did a survey on routing algorithms in UWSNs, which aimed to solve problems such as data transmitting and node deployment, as well as localization. In the survey, the authors analyzed and compared several clustering routing algorithms such as the DUCS [19], the distributed minimum cost clustering protocol (MCCP) [25], temporary cluster-based routing (TCBR) [26], the location-based clustering algorithm for data gathering (LCAD) [27], and the multipath virtual sink architecture [28]. The MCCP was proposed by Pu et al. in [25], which addresses the hotspots near the SN and balances the traffic load. In addition, the MCCP determines the number of CMNs according to the locations of the CHNs and the SN. However, the multi-hop method is not

supported in the MCCP and the period of re-clustering is too long. TCBR was presented by Ayaz et al. in [26], where multiple SNs are placed on the water's surface in order to solve the problem that nodes near the SN consume more energy and die prematurely. TCBR can balance the energy dissipation, but it cannot achieve high efficiency in time-critical applications. The LCAD was given by Anupama et al. in [27], where horizontal acoustic communication is employed when CMNs transmit data to CHNs, and autonomous underwater vehicles (AUVs) are used when CHNs send data to the SN. The LCAD can solve the energy hole problem and reduce energy dissipation. However, it relies on the network structure and its effectiveness is affected if the node mobility is considered. The multipath virtual sink architecture was proposed by Seah and Tan in [28], where the aggregation nodes aggregate the data from other nodes in the same cluster, and then transmit the aggregated data to the SNs. The authors assume that these SNs can achieve high-speed communications so that they form a virtual SN. This method can guarantee high reliability, but the duplicate data packets result in redundant transmission, which increases the resource consumption. A pressure routing algorithm for UWSNs was presented by Uichin et al. in [29], which employs anycast routing to send data to the SN according to the pressure levels. Pressure routing can achieve high delivery ratios and low end-to-end delay, but it consumes more energy because of the use of opportunistic routing and the repeated transmission of the copies of same packets. The cluster sleep–wake scheduling algorithm in UWSNs was proposed by Zhang et al. in [30], which shows the rotating temporary control nodes that control the sleep–wake scheduling, thus minimizing the energy dissipation. The energy optimization clustering algorithm (EOCA) was put forward by Yu et al. in [11], where the number of neighboring nodes, the remaining energy of nodes, the motion of nodes, and the distance factor are taken into account. Additionally, the EOCA provides a maximum effective communication range based on the remaining energy of nodes, thereby controlling the energy dissipation for packet delivery. However, the EOCA does not optimize the multi-hop paths for data transmission to the SN.

Greedy algorithms have shown great strength in addressing combinational optimization problems, which make local optimal choices at every step [31,32]. They are effective in finding global optimal solutions in specific circumstances [33], and we take Dijkstra's algorithm and Prim's algorithm as examples [34,35]. Dijkstra's algorithm, which was proposed by Edsger Wybe Dijkstra in 1959, has been widely used to look for the shortest paths between network nodes. It can thus be employed in routing algorithms to find the shortest path to the destination node [36]. Prim's algorithm constructs minimum spanning trees and can usually find the best solutions [37]. Nevertheless, the greedy algorithms are considered short-sighted because they only make the best choice at every step and do not consider the overall condition. That is the reason why they cannot obtain the optimal solution sometimes. Hence, researchers proposed many metaheuristics extending greedy algorithms, which can be applied to a wide range of different problems [38–41]. The greedy randomized adaptive search procedure (GRASP) was presented by Feo et al. in [38], where the present problem can be solved in every iteration. Each iteration has two stages: stage one provides the initial solution and stage two aims to find the improved solution by applying the local search procedure to the solution provided by stage one. The fixed set search (FSS) was proposed by Jovanovic et al. in [39], which adds the learning method to the GRASP and is thus more effective than the GRASP in the solution quality as well as the computational cost. In the work of Arnaout in [40], the worm optimization (WO), on the basis of the worm behaviors, was proposed to solve unrelated parallel machine schedule problems, which can find the optimal solution as well as reduce the makespan. In [41], the particle swarm optimization (PSO) and the fuzzy algorithm are used in a clustering scheme for UWSNs, which can find the optimal number of clusters and select the optimal CHNs, thereby reducing the energy dissipation and prolonging the lifespan of UWSNs.

The ant colony optimization (ACO) algorithm is also a population-based metaheuristic that extends the greedy algorithm, which has been widely used to optimize routing paths [42–44]. The ACO can find optimal paths from source nodes to destination nodes so that the energy consumption can be reduced and the network lifetime can be prolonged. ACO algorithms simulate ant behavior, as ants could usually find the optimal paths to foods [45]. Ants release pheromones on the path that they make. Other ants are more likely to choose the path with higher pheromone concentration, and the following ants will also release pheromones on the path, which increases the pheromone concentration [46]. The higher pheromone concentration will attract more ants, which forms a positive feedback loop. After a period of time, the ant colony will find the shortest path to the food source.

So far, many researchers have applied ACO algorithms to routing algorithms. Agarwal et al. combined ACO algorithms with the LEACH algorithm for prolonging the lifetime of TWSNs, and they validated the effectiveness of the algorithm by simulation experiments [47]. Okdem et al. applied ACO algorithms to routing algorithms by taking into account the hop count and the residual energy of neighbor nodes, which can reduce the energy consumption to a certain extent, but the algorithm can only balance the local energy consumption [48]. Camilo et al. improved the pheromone update process of ACO algorithms when designing routing algorithms, and took into account the total energy of all nodes, thereby improving the energy efficiency of the entire network [49]. Shan proposed a threat cost calculation for submarine path planning based on ACO algorithms [50]. He presented a new cost function that took into account the path length and distance factor, and adopted a coalescing differential evolution mechanism when updating the pheromone so as to settle the local optimum problem. Zhang et al. proposed a clustering algorithm on the basis of the ACO algorithm, which was designed for TWSNs instead of UWSNs. When selecting CHNs, they considered the residual energy of candidate nodes and the distance factor. When looking for routing paths, the authors took into account the path length as well as the node energy, which can balance the network energy consumption [51]. Sun et al. presented a routing protocol based on ACO algorithms for TWSNs, where the remaining energy of nodes, the transmission direction, and the distance between nodes were considered in order to look for ideal routing paths and reduce the energy consumption of the network [52]. Liu proposed an effective transmission strategy using ACO algorithms, which can improve the energy efficiency and prolong the network lifetime. Additionally, the improved ACO algorithm was unlike the traditional one: no heuristic information and just one step for every ant in its whole trip [53]. The literature mentioned above indicate that ACO algorithms could be employed to look for the optimal routing paths in networks. Nevertheless, the problem of clustering routing algorithms in UWSNs has not been resolved, so we need to make some improvements to the existing ACO algorithms and apply them to UWSNs.

To our knowledge, few studies have applied ACO algorithms in UWSNs when designing clustering routing algorithms. It is of great significance to design an energy-efficient routing algorithm that can minimize the energy consumption and ultimately maximize the network lifetime. Therefore, this paper presents a clustering routing algorithm based on an improved ACO algorithm for UWSNs. Firstly, we describe the network model and the energy consumption model that can be used to quantify energy consumption and evaluate the energy efficiency of the proposed algorithm. Secondly, we present an improved ACO algorithm through the improvement of the heuristic information, the evaporation parameter for the pheromone update mechanism, and the ant searching scope. To improve the heuristic information of the traditional ACO algorithm, we consider not only the residual energy but also the distance factor in the proposed heuristic information. Additionally, the proposed adaptive strategy of the evaporation parameter for the pheromone update mechanism helps improve the global search ability and the convergence rate of the algorithm. Thirdly, we design the clustering routing algorithm, which has two main phases in one round: CHN selection phase and data transmission phase. In the first phase, we optimize CHN selection by considering the residual energy of nodes, the distance from the node to the SN and the average distance between the node and the other nodes in the cube. In the second phase, the single-hop method is adopted for the data transmission from CMNs to CHNs, and the multi-hop method is employed when CHNs transmit data to the SN, and the optimal multi-hop paths are found by the improved ACO algorithm. Finally, simulation results show that compared to five other algorithms, the proposed algorithm can effectively reduce the energy consumption of the network, prolong the network lifetime, and decrease the packet loss ratio.

The remainder of the rest of the paper is as follows. The network model and energy consumption model are presented in Section 2. Section 3 proposes the improved ACO algorithm. The proposed clustering routing algorithm is given in Section 4. Simulation results and analyses are provided in Section 5. Section 6 draws the conclusion.

#### **2. Model Assumptions**

#### *2.1. Network Model*

This paper presents a large-scale 3D network model for UWSNs where the underwater sensor nodes are randomly deployed in an underwater monitoring area. Figure 1 illustrates the network model and the description is as follows:


**Figure 1.** The schematic diagram of the network model.

#### *2.2. Energy Consumption Model*

To quantify the energy consumption, this paper refers to the underwater acoustic energy consumption model given in [55]. We assume that the minimum power for one node to receive a data packet is *P*0. Then the minimum transmission power needs to be *P*0*A*(*l*). *A*(*l*) is the attenuation function, which is presented by:

$$A(l) = l^k a^l \tag{1}$$

where *l* is the distance between transmitter node and receiver node and *k* is the energy spreading factor (1 for cylindrical, 2 for spherical, 1.5 in general), and

$$a = 10^{a(f)/10} \tag{2}$$

is decided by the absorption coefficient, which is presented by:

$$a(f) = 0.11\frac{f^2}{1+f^2} + 44\frac{f^2}{4100+f^2} + 2.75 \times 10^{-4}f^2 + 0.003\tag{3}$$

where *f* is carrier frequency in kHz. Then we can define the energy consumption for sending and receiving:

$$E\_t(l) = T\_t P\_0 A(l) \tag{4}$$

$$E\_r = T\_r P\_0 \tag{5}$$

where *Et* (*l*) and *Er* are energy consumption for transmitting and receiving, respectively. *Tt* and *Tr* are the time duration for a node to transmit and receive one data packet, respectively. The time duration can be calculated by the data packet length and the data transmission rate.

#### **3. The Improved Strategy of the ACO Algorithm**

#### *3.1. Overview of ACO*

The ACO algorithm is widely used to find an optimal path between a source node and a destination node. When searching for the destination node, artificial ants deposit a chemical substance called a pheromone on the path that they pass [56]. The pheromone is the medium that ants used to communicate and it guides other ants. Ants are more likely to follow a path with a higher pheromone concentration, and the following ants also release pheromones on the path, which increases the pheromone concentration. The increased pheromone concentration attracts more ants, which forms a positive feedback loop [57]. The pheromone matrix is a two-dimensional matrix used to record the pheromone values on every partial path. We use τ*ij*(*t*) to denote the pheromone concentration between node *i* and node *j* at time *t*. Additionally, *t* is the iteration counter. Moreover, the pheromone volatilizes with time. After all ants have completed a path search, the pheromone matrix should be updated. The global pheromone update rule is presented as follows:

$$
\pi\_{i\bar{j}}(t+1) = (1 - \rho)\pi\_{i\bar{j}}(t) + \Delta\pi\_{i\bar{j}}(t) \tag{6}
$$

$$
\Delta \pi\_{i\bar{j}}(t) = \sum\_{k=1}^{q} \Delta \pi\_{i\bar{j}}^k(t) \tag{7}
$$

$$
\Delta \tau\_{ij}^k(t) = \begin{cases}
\text{Q}/\text{L}\_k & \text{'} (i,j) \in \text{tour byant } k \\
0 & \text{'} \text{otherwise}
\end{cases}
\tag{8}
$$

where ρ (0 < ρ < 1) is the evaporation parameter, *q* is the total number of ants, *Q* is the total amount of pheromone, and *Lk* is the total length of the path that the *k*th ant passes during this time. Nevertheless, a too high pheromone concentration may cause a local optimum of the algorithm and a too low pheromone concentration may not attract other ants. Thus, we employ the method introduced in the max–min ant system (MMAS) to limit the pheromone value [58], which is presented as follows:

$$\tau\_{ij}(t+1) = \begin{cases} \tau\_{\text{max}}, \tau\_{ij}(t+1) > \tau\_{\text{max}} \\ (1 - \rho)\tau\_{ij}(t) + \Delta\tau\_{ij}(t), \text{otherwise} \\ \tau\_{\text{min}}, \tau\_{ij}(t+1) < \tau\_{\text{min}} \end{cases} \tag{9}$$

where τmax and τmin represent the maximum and the minimum of the pheromone values, respectively. The limitation of the pheromone values could avoid the stagnation of the searching process and improve the global convergence of the algorithm.

In ACO, the transition probability from node *i* to node *j* for the *k*th ant can be given by:

$$p\_{ij}^k = \begin{cases} \frac{(\tau\_{ij}(t))^a (\eta\_{ij})^\beta}{\sum\_{m \in \mathcal{U}\_k} (\tau\_{im}(t))^a (\eta\_{im})^\beta}, j \in \mathcal{U}\_k\\ 0, \text{ otherwise} \end{cases} \tag{10}$$

where *Uk* represents the set of next hop nodes available to the ants, η*ij* is the heuristic information, α is the pheromone parameter, and β denotes the heuristic information parameter.

Ants transfer to the next hop node according to (10) until they arrive at the destination node. After all *q* ants have reached the destination node, the pheromone matrix is updated. It is decreased by evaporation, and ρ ranging from 0 to 1 is the evaporation parameter. The evaporation process contributes to avoiding unrestrained accumulation of the pheromone concentration. If one partial path is not selected by ants, its pheromone concentration decreases gradually, which makes ants not choose this bad path over time. The pheromone value is increased if the ants deposit pheromone on the path. The better paths receive more pheromone released by ants, which are more likely to be selected in future. Every pheromone value in the pheromone matrix is updated according to (7), (8), and (9). After the pheromone matrix is updated, the next iteration begins.

#### *3.2. The Improved Evaporation Parameter*

Researchers have proposed many methods to update the pheromone values. For example, Jovanovic and Tuba put forward a very efficient pheromone correction procedure based on the concept of suspicion, which avoids the local convergence of the ACO and enhances the overall performance of the ACO [59]. In this paper, we aim at the evaporation parameter ρ and propose an adaptive strategy to influence the update of the pheromone values. The evaporation parameter ρ is important to the ACO algorithm. In the most ACO algorithms, ρ is a fixed value. When the value of ρ is unreasonable, the convergence rate of the algorithm is affected. If the value is too small, the pheromone evaporation speed is too slow, making ants just follow the path with a high pheromone concentration and do not try to look for other potential paths. That means the algorithm can easily fall into the local optimum. If the value is too large, the pheromone volatilizes too fast, which causes the ACO to converge slowly. The adaptive strategy for evaporation parameter ρ is given by:

$$\rho(\mathbf{x}) = \frac{X}{X+\mathbf{x}} \times e^{-b\mathbf{x}} \tag{11}$$

where *X* denotes the total number of iterations, *x* is the current number of iterations, and *b* is a constant. At the beginning, the pheromone volatilizes faster, and the pheromone concentration has a weaker guiding effect on the ants, which is helpful for the ants to find other potential paths. As the iterations increase, the value of ρ(*x*) gradually decreases, and the pheromone evaporation slows down. The positive feedback increases, which makes the ants tend to choose the path with a higher pheromone concentration. At this time, the ants have searched for feasible paths for a long time and the path

with a higher pheromone concentration is the better choice. So, the proposed strategy is capable of improving the global search ability and the convergence rate of the algorithm.

#### *3.3. The Heuristic Information*

The heuristic information η*ij* is only related to the distance to the next hop node, which can be calculated by:

$$
\eta\_{ij} = \frac{1}{d\_{ij}} \tag{12}
$$

where *dij* denotes the distance between node *i* and the next hop node *j*. Nevertheless, in UWSNs, the distance from node *j* to the SN also has an influence on the network energy consumption. If the next hop node *j* is closer to the SN, it tends to consume less energy to forward data. In addition, the energy of the next hop node also affects the balance of the energy consumption, which helps to prevent the node with low energy becoming the next hop node. Hence, this paper defines an improved strategy for heuristic information:

$$\eta\_{ij} = \frac{1}{\sigma d\_{ij} + (1 - \sigma)d\_{js}} \times \frac{E\_{j\text{res}}}{E\_{\text{ini}}} \tag{13}$$

where σ is a constant ranging from 0 to 1, *Ejres* denotes the residual energy of the next hop node *j*, *Eini* indicates the initial energy of node *j*, and *djs* represents the distance from node *j* to the SN. From (13), we can see that the heuristic information is positively related to the residual energy of the next hop node *j*, and is negatively correlated with the distance between node *i* and node *j* and the distance from node *j* to the SN. It is more likely for node *j* to become the next hop node if the value of the heuristic information is larger.

#### *3.4. The Proposal of Ant Searching Scope*

The searching scope is crucial to the algorithm. Too small a scope may result in a failure to find the next hop node and too large a scope could lead to the slow convergence of the algorithm. To alleviate this problem, this paper presents the searching scope, as shown in Figure 2. *R* presents the transmission radius of nodes and θ denotes the searching scope. The density of nodes in the network and the transmission radius of nodes are two important factors to the searching scope. A high density of nodes and a large transmission radius only require a small scope. Clearly, if the value of θ is smaller, the transmission direction is closer to the SN. Theoretically, when the value of θ is zero, it is the best transmission direction from node *i* to the SN. However, in fact, there may not be enough nodes existing in that best transmission direction. If an ant cannot find an appropriate next hop node, the searching scope should be enlarged and θ should be smaller than 90 degrees.

**Figure 2.** The searching scope.

#### **4. Clustering Routing Algorithm Design**

The clustering routing algorithm has two main phases: CHN selection phase and data transmission phase. In the algorithm, the network is divided into cubes, and each cube is seen as a cluster. In every cluster, nodes run for CHN. One of them will be selected as the CHN and the others become CMNs. CMNs collect data and send them to the CHN by a single hop. After receiving the data from the CMNs, the CHN processes the data and then transmits them to the SN in one data packet by multiple hops. The relay nodes on multi-hop paths are other CHNs and the optimal path to the SN is found by using the improved ACO algorithm.

#### *4.1. Cluster Head Selection Phase*

CHNs play a very important role in data transmission. The CHNs are responsible for processing the data received from their CMNs, and then forwarding the processed data to the SN. Many algorithms, such as the LEACH algorithm, generate CHNs in a random selection without considering the residual energy of the nodes. If the residual energy of the selected CHNs is too low, the nodes will die too early, which is bad for energy balance and network efficiency. Therefore, when selecting CHNs, the residual energy of the nodes should be considered. If the residual energy of one node is less than the average energy of the nodes in its cluster, it will not be qualified for the selection. In this paper, we consider not only the residual energy of nodes but also the distance factor to select CHNs. Hence, we propose an index for CHN selection as follows:

$$I\_i = \frac{\lambda E\_{\text{irrs}}}{d\_{is} d\_{\text{avg}}} \tag{14}$$

$$d\_{w\chi} = \frac{1}{N-1} \sum\_{n=1}^{N-1} d\_{in} \tag{15}$$

where λ is a constant, *Eires* is the residual energy of node *i*, *dis* is the distance between node *i* and the SN, *davg* is the average distance between node *i* and the other nodes in the cube, *N* is the total number of nodes in the cube, and *din* is the distance between node *i* and node *n* in the cube. It can be seen from (14) that it is more likely for a node to become a CHN if it has more residual energy, a shorter distance to the SN, and a shorter average distance to the other nodes in the cube.

In each cube, every qualified node calculates its value of *Ii* and broadcasts the message with its ID and *Ii* value to other nodes in the cube. Through comparisons, the node with the largest value of *Ii* will become a CHN. Then the CHN broadcasts the CHN message to the other nodes in the cube. After receiving the CHN message, the nodes reply to the CHN with an acknowledgement message and become CMNs. In addition, all the selected CHNs send message packets to the SN and the packets carry information such as the ID, the location, and the residual energy.

#### *4.2. Data Transmission Phase*

The data transmission phase includes intra-cluster data transmission and inter-cluster data transmission. In the intra-cluster data transmission, the CHNs allocate time slots by a time division multiple access (TDMA) scheme for the CMNs to send data packets to their own CHNs by a single hop. After the CMNs transmit the data packets for this round, they turn to sleep mode in order to reduce energy consumption. After receiving the data packets from all the CMNs in the cluster, the CHNs process the data and then transmit them to the SN by using a carrier sense multiple access with collision detection (CSMA/CD) mechanism through multiple hops and the optimal multi-hop paths are found by the improved ACO algorithm. If some CHNs are near the SN, they can directly transmit the data to the SN by a single hop. The process of the improved ACO algorithm is shown in Figure 3 and the steps are given as follows:

**Figure 3.** The process of the improved ant colony optimization (ACO) algorithm.

Step 1: To ensure the initial search ability of ants, the initial energy and the initial pheromone concentration of each node are set to be equal. Each node has a unique ID.

Step 2: The source node generates a forward ant at regular intervals. The format of the routing table carried by the forward ant is shown in Table 1. The taboo list indicates the nodes that the ant has visited and these nodes cannot be accessed in future searches.


Step 3: The transfer probability to the next hop node is calculated by (10). The ant transfers to the next CHN according to this probability. Then this next hop node is added to the taboo list, and the hop count is increased by one.

Step 4: Step 3 is repeated until the ant reaches the SN. At the same time, the forward ant dies, and the corresponding backward ant is generated. The backward ant carries the routing information of the forward ant and returns to the source node by the path that the forward ant made. The routing information no longer changes as the backward ant returns. When the backward ant reaches the source node, a routing path is established.

Step 5: Steps 2, 3, and 4 are repeated until all ants have completed a path search. By this time, the present iteration ends. Then the search paths of the ants are noted, the taboo list is cleared, and the pheromone is updated according to (9).

Step 6: Steps 2 to 5 are repeated until the preset number of iterations is reached, and the optimal path output is shown.

It is noted that only one CHN finds its optimal path to the SN after Step 6, and the number of CHNs is equal to the number of small cubes in the network. Hence, by changing the IDs of the source nodes and repeating the whole process of the ACO algorithm, the paths from the other CHNs to the SN can be determined. In this paper, the destination node is always the SN and the CHNs that need to send data packets become the source nodes. The relay nodes on multi-hop paths are other CHNs. Furthermore, the search process for the optimal multi-hop routing paths is accomplished in the SN because it has a continuous energy supply. After the CHNs are selected, they send the SN messages with information such as IDs, locations, and residual energy so that the SN can figure out the optimal paths by using the improved ACO and then transmit the routing information to the CHNs.

By the time all the CHNs have sent the data to the SN, one round is over. At this time, if in one cube the residual energy of the CHN is more than half of the average energy of other nodes, the CHN of the next round stays the same, which can save energy and time. Otherwise, a new CHN is selected in the next round. The new selected CHN transmits its information to the SN so that the SN can restart the process of the ACO algorithm and find the optimal path for the new CHN.

#### **5. Simulation Results and Analyses**

For the convenience of comparison, the proposed algorithm in this paper is called ant colony optimization clustering routing (ACOCR). Five existing popular algorithms: LEACH [18], DUCS [19], LEACH-ANT [47], CUWSN [23], and EOCA [11] were chosen as the references to validate the proposed algorithm according to the number of surviving nodes, the energy consumption of the network, and the packet loss ratio. MATLAB was used to carry out the simulation where sensor nodes were randomly placed in a 3D area of 5000 m × 5000 m × 1000 m and the coordinate of the SN was (2500, 2500, 0). The network was divided into 64 cubes. The number of sensor nodes ranged from 300 to 500 for different scenarios. The data packet was 1024 bits in length and the data transmission rate was 2048 bps, by which the time duration for a node to transmit and receive data packets could be calculated. The broadcast and other message packets were 64 bits in length. The sound speed was 1500 m/s. As for the energy consumption parameters, the receiving power *P*<sup>0</sup> was set to 50 μW and the initial energy for every node was 120 J. The frequency *f* was 10 kHz.

#### *5.1. Comparison and Analysis of the Number of Surviving Nodes*

Figure 4 shows the number of surviving nodes versus the number of rounds for the proposed algorithm and reference algorithms when 400 nodes were considered in the network, from which we can see that the number of surviving nodes decreases with the increase in the network rounds no matter which algorithm is used. However, by using the proposed ACOCR, the network always has the largest number of surviving nodes.

**Figure 4.** The number of surviving nodes versus the number of rounds for the six algorithms.

In order to further assess the network lifetime, this paper brings in some metrics, such as first node dead (FND), half of the nodes dead (HND), and last node dead (LND). Figure 5 illustrates the number of rounds when FND, HND, and LND arise for the six algorithms, from which we can see that the first node of the ACOCR, EOCA, CUWSN, LEACH-ANT, DUCS, and LEACH dies in about the 806th, 686th, 632th, 569th, 481th, and 423th round, respectively. That indicates that with respect to the FND metric, the efficiency of the proposed ACOCR is 17.5%, 27.5%, 41.7%, 67.6%, and 90.5% higher than that of EOCA, CUWSN, LEACH-ANT, DUCS, and LEACH, respectively. As for the HND and LND, the proposed ACOCR outperforms LEACH by 63.2% and 65.2%, respectively. In conclusion, the proposed ACOCR algorithm has the best performance in prolonging the network lifetime because it adopts the improved CHN selection scheme by comprehensively considering the residual energy of the nodes, the distance between the node and the SN, and the average distance between the node and the other nodes in the cube. The CHN selection is capable of distributing the network load equally and preventing the nodes with low energy from becoming CHNs so as to prevent the premature death of nodes. Additionally, the ACOCR employs the improved ACO to find the optimal paths between CHNs and the SN in order to reduce the energy consumption. The LEACH has the worst performance, as it randomly selects CHNs without considering the residual energy of the nodes, which makes some nodes with insufficient residual energy be selected as CHNs and thus die too early. In addition, it does not consider the multi-hop paths when the CHNs send data packets to the SN. The LEACH-ANT algorithm and the DUCS algorithm outperform the LEACH algorithm. This is because the LEACH-ANT algorithm employs ACO algorithms to look for the next hop node, and the DUCS algorithm selects the CHN according to the residual energy of the node. However, the LEACH-ANT algorithm does not optimize the CHN selection or improve the ACO algorithm, and the DUCS algorithm does not consider the optimal paths from CHNs to the SN. Hence, they are inferior to the proposed ACOCR algorithm.

**Figure 5.** The number of rounds when first node dead (FND), half of the nodes dead (HND), and last node dead (LND) arise for the six algorithms.

#### *5.2. Comparison and Analysis of the Energy Consumption of the Network*

Figure 6 illustrates the total energy consumption versus the number of rounds for the six algorithms when 400 nodes were considered in the network, from which we can see that the total energy consumption rises with the increase in the network rounds regardless of which algorithm is used. However, the proposed ACOCR algorithm is the most efficient one in reducing the energy consumption. For example, in round 600, the total consumed energy of the ACOCR, EOCA, CUWSN, LEACH-ANT, DUCS, and LEACH accounts for 32.5%, 41.1%, 50.6%, 58.2%, 65.4%, and 83.8% of the initial energy of the network, respectively. As for the network energy that is completely consumed, the energy efficiency of the ACOCR is improved by 14.7%, 18.3%, 29.3%, 45.3%, and 65.2% compared to that of the EOCA, CUWSN, LEACH-ANT, DUCS, and LEACH, respectively. This is because the proposed ACOCR optimizes CHN selection and employs the optimal paths found by the improved ACO algorithm to transmit the data packets, thereby minimizing the energy consumption. The EOCA and the CUWSN outperform the LEACH, DUCS, and LEACH-ANT. However, both of them are inferior to the ACOCR, which is because neither of them optimizes the multi-hop paths for data transmission.

**Figure 6.** The total energy consumption versus the number of rounds for the six algorithms.

Figure 7 demonstrates the number of rounds when the network energy is exhausted versus the different number of nodes in the network for the six algorithms, which validates the effect of the different number of network nodes on energy consumption. As the number of nodes increases, the number of rounds when the network energy is exhausted also increases. This is because more nodes in the network lead to a better balance of energy consumption. The proposed ACOCR outperforms the other five algorithms in all situations. For example, when there are 450 nodes in the network, the ACOCR algorithm is 10.1%, 15.6%, 19.2%, 43.4%, and 52.9% more efficient than the EOCA algorithm, the CUWSN algorithm, the LEACH-ANT algorithm, the DUCS algorithm, and the LEACH algorithm, respectively.

**Figure 7.** The number of rounds when energy is exhausted versus the number of nodes for the six algorithms.

#### *5.3. Comparison and Analysis of the Packet Loss Ratio*

Table 2 provides the packet loss ratio after round 1200 for the six algorithms when 400 nodes were considered in the network. The packet loss ratio is defined in this paper as the ratio of the number of data packets that the CHNs send to the number of data packets that the SN receives during the whole simulation process. As we can see from the table, the packet loss ratio of the proposed ACOCR is the lowest. The LEACH, which performs the worst, has about a 1.62 times higher packet loss ratio than the proposed ACOCR does. This is because the ACOCR adopts the improved ACO algorithm to find the optimal routing paths, which can reduce the risk of packet loss.



Figure 8 demonstrates the received packets by the SN versus the number of rounds for the six algorithms when 400 nodes were considered in the network. The more packets the SN receives, the more efficient the algorithm is. Apparently, the ACOCR algorithm has the best performance, the efficiency of which is 18.6%, 27.4%, 44.1%, 60.9%, and 84.1% higher than that of the EOCA, CUWSN, LEACH-ANT, DUCS, and LEACH, respectively, in round 1200.

**Figure 8.** The number of received packets versus the number of rounds for the six algorithms.

#### **6. Conclusions**

To alleviate the problem of energy consumption in UWSNs, this paper presented an energy-efficient clustering routing algorithm based on the improved ACO algorithm. The contributions of the paper were as follows. Firstly, the improvement of the heuristic information was proposed in the paper based on the consideration of the residual energy of nodes and the distance factor. Secondly, this paper provided the improved adaptive strategy of the evaporation parameter for the pheromone update mechanism, which can be of help to the global search ability and the convergence rate of the algorithm. Thirdly, this paper proposed the ant searching scope. Fourthly, we optimized the CHN selection by

considering the residual energy of nodes, the distance from the node to the SN, and the average distance between the node and the other nodes in the cube. Finally, simulation results demonstrated that the proposed ACOCR algorithm outperforms the LEACH, the DUCS, the LEACH-ANT, the CUWSN, and the EOCA in terms of the network lifetime, the energy consumption, and the packet loss ratio. The limitation of the paper is that the multipath effect of underwater channels was not considered. Therefore, we plan to study the multipath effect on the data packet transmission and design cross-layer protocols in the future. Moreover, in this paper, we employed a random method to generate the network node. In order to make the network model closer to the practical situation, we plan to use NS-3 to simulate our algorithm and call the function to generate the nodes as well as set attributes for them.

**Author Contributions:** Conceptualization, X.X. and H.H.; methodology, X.X.; software, X.X.; validation, X.X. and H.H.; formal analysis, X.X.; investigation, X.X.; resources, X.X.; data curation, X.X.; writing—original draft preparation, X.X.; writing—review and editing, X.X.; visualization, X.X.; supervision, H.H.; funding acquisition, H.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Key R&D Program of China, grant number 2018YFC1405904.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Citywide Cellular Traffic Prediction Based on a Hybrid Spatiotemporal Network**

#### **Dehai Zhang \*, Linan Liu, Cheng Xie, Bing Yang and Qing Liu**

School of Software, Yunnan University, Kunming 650504, China; liulinan@mail.ynu.edu.cn (L.L.); xiecheng@ynu.edu.cn (C.X.); yang.bing@mail.ynu.edu.cn (B.Y.); liuqing@ynu.edu.cn (Q.L.)

**\*** Correspondence: dhzhang@ynu.edu.cn

Received: 27 November 2019; Accepted: 6 January 2020; Published: 8 January 2020

**Abstract:** With the arrival of 5G networks, cellular networks are moving in the direction of diversified, broadband, integrated, and intelligent networks. At the same time, the popularity of various smart terminals has led to an explosive growth in cellular traffic. Accurate network traffic prediction has become an important part of cellular network intelligence. In this context, this paper proposes a deep learning method for space-time modeling and prediction of cellular network communication traffic. First, we analyze the temporal and spatial characteristics of cellular network traffic from Telecom Italia. On this basis, we propose a hybrid spatiotemporal network (HSTNet), which is a deep learning method that uses convolutional neural networks to capture the spatiotemporal characteristics of communication traffic. This work adds deformable convolution to the convolution model to improve predictive performance. The time attribute is introduced as auxiliary information. An attention mechanism based on historical data for weight adjustment is proposed to improve the robustness of the module. We use the dataset of Telecom Italia to evaluate the performance of the proposed model. Experimental results show that compared with the existing statistics methods and machine learning algorithms, HSTNet significantly improved the prediction accuracy based on MAE and RMSE.

**Keywords:** communication traffic prediction; intelligent traffic management; deformable convolution; attention mechanism

#### **1. Introduction**

With the advent of fifth-generation mobile networks (5G), the cellular Internet of Things (IoT) has become a popular topic in industry [1,2]. The Groupe Speciale Mobile Association (GSMA) predicted that by 2020, the number of IoT connections will exceed 30 billion, and the number of connections based on cellular technology will reach one to two billion. The current 4G wireless network has greatly affected our lives, and the stable communication system brought by the future 5G will become a powerful promoter of Industry 4.0 [3–5]. Real-time and secure data transmission is an important guarantee for Industry 4.0, and 5G has the characteristics of large transmission data, high security, and short delay time.

At the same time, the explosive growth of global mobile devices and the IoT has also accelerated the era of big data [6,7]. Communication equipment plays an increasingly important role in people's daily lives, such as sensing, communication, entertainment, and work. A large number of communication services have generated countless mobile data; the wireless cellular networks carrying the data have become increasingly advanced and complex; and a large quantity of real-time system operation data is generated every moment. To realize intelligent management of cellular networks, it is very important to perform real-time or non-real-time regular analysis and accurate prediction of cellular traffic. For example, accurate prediction of future traffic can greatly increase the efficiency of demand aware resource allocation [8].

However, cellular network traffic changes have elusive rules, and the variation in traffic in a particular region is strongly correlated with many external factors, such as business, location, time, and user lifestyle. To extract more effectively the changing characteristics of cellular network traffic, many related studies have been carried out. The existing methods can be divided into two types: statistical or probabilistic methods and machine learning methods.

For the first kind of methods, this includes the autoregressive integrated moving average (ARIMA) [9,10], *α*-stable distribution [11], and covariance function [12]. In the traffic prediction problem, these methods have comprehensively studied the characteristics of cellular networks and have shown that changes in communication traffic have both temporal autocorrelation and spatial autocorrelation. However, as the communication modes of cellular networks become more complex and subject to many external factors, these traditional linear statistical methods are not suitable for current communication traffic prediction problems.

With the development of artificial intelligence technology, machine learning methods have been widely used in industry and have also been used for cellular network traffic prediction in recent years [13]. Early researchers proposed using linear regression [14] and SVM regression [15] to predict cellular traffic. Many studies have also proposed methods for traffic prediction based on deep learning. In [16], the authors proposed a deep learning based prediction method to simulate the long term dependence of cellular network traffic in 2017. The method mainly uses self-encoded depth models and long short term memory cells (LSTM) for space-time modeling. However, the processing of self-encoding will lose some of the original information, and the ability to extract spatial features needs to be improved. In 2018, Zhang et al. [17] proposed a cellular traffic prediction method based on a convolutional neural network (STDenseNet); however, this method did not consider the impact of external conditions on traffic prediction, and the traditional convolution method had a limited effect on the complex spatial characteristics of cellular traffic.

Motivated by the aforementioned problems, based on STDenseNet, this work proposes a new hybrid spatiotemporal network (HSTNet). First, the deformable convolution unit is used in the model to improve the ability to extract complex spatial features. Then, time characteristics are introduced to enhance the accuracy of traffic prediction. Finally, an attention mechanism based on traffic history data is proposed to further enhance the robustness of the model.

#### **2. Data Observation and Analysis**

The wireless communication data analyzed in this paper were from Telecom Italia, which is the traffic statistics sent or received by users in specific areas of Milan [18]. The dataset consisted of a time series of traffic from 1 November 2013 to 1st January 2014, with an interval of 10 min, and included three parts: short message service (SMS), call service (Call), and Internet access (Internet). The entire urban area was divided into cells of size *H* × *W*. *H* and *W* represent the number of rows and columns of cells. In this dataset, *H=W* = 100, indicating that the area of Milan is composed of a grid overlay of 10,000 cells with a size of about 235 × 235 square meters, and the value of the cell represents the statistical value of the traffic in and out of this area. Traffic data were recorded from 00:00 11/01/2013 to 23:59 01/01/2014. We merged data at ten minute intervals into hour intervals and divided each dataset into 1488 (62 days × 24 h) fragments. In the datasets, SMS and Call contained two dimensions of traffic, namely receiving and sending. The Internet only recorded the traffic that was accessed. In order to compare the spatiotemporal characteristics of the traffic of the three services clearly, we combined the traffic of the receiving and sending dimensions into one.

Thus, the entire dataset could be represented as data of [*c*, *t*, *H*, *W*] dimensions *Fc*,*<sup>t</sup>* where *c* represents the type of cellular traffic in the dataset, *c* ∈ {SMS, Call, Internet}. *t* represents the time interval of the flow, and *t* = 1 h in this work. *H* and *W* are as described above.

$$\mathbf{F}\_{c,t} = \begin{bmatrix} f\_{c,t}^{(1,1)} & f\_{c,t}^{(1,2)} & \dots & f\_{c,t}^{(1,\mathcal{W})} \\ f\_{c,t}^{(2,1)} & f\_{c,t}^{(2,2)} & \dots & f\_{c,t}^{(2,\mathcal{H})} \\ \vdots & \vdots & \ddots & \vdots \\ f\_{c,t}^{(H,1)} & f\_{c,t}^{(H,2)} & \dots & f\_{c,t}^{(H,\mathcal{W})} \end{bmatrix} \tag{1}$$

where *f* (*H*,*W*) *<sup>c</sup>*,*<sup>t</sup>* represents the traffic statistics of the cell (*H, W*) of the traffic data of type *c* at time *t*.

#### *2.1. Temporal Domain*

Figure 1 shows the trends in three different traffic flows over a 48 h period; from top to bottom, SMS, Call, and Internet. From the picture, we can clearly find that the three different traffics were subjected to a strong daily time change pattern. Basically, the traffic started to increase at approximately eight o'clock in the morning and then stayed above a very high level and started to fall at approximately eight o'clock in the evening. There were also significant differences between different traffics. The Internet traffic remained constant even at night, and Call and SMS were mostly concentrated during the day. These rules clearly correspond with the daily lives of people. Compared to daytime work and life, people make very few SMS messages and calls at night. However, the Internet not only provides contact needs, but automatic access equipment, entertainment, and other factors will lead to a large number of Internet accesses at night. Therefore, the day and night trend is relatively small compared to SMS and Call.

**Figure 1.** Hourly traffic change statistics.

Figure 2 is the dynamic graph of the daily total traffic in November 2013. The three types of traffic had obvious differences on working days and holidays, and the traffic on holidays was much lower than that on working days, which showed whether it was a working day that had an important impact on daily traffic. For example, the first day in the picture is an Italian legal holiday, and the second and third days are the weekend; the total traffic on these three days was significantly less than the next five working days.

Comparing the changes in the three types of traffic on weekdays and holidays, we found that the gap of Call traffic on the two dates was considerable, and the working day traffic was generally close to twice the holiday traffic. SMS traffic trends were smaller than Call, but the gap was also large. The reason was obvious: users needed to communicate more during the working day. Similar to the statistical rule of Figure 1, the daily traffic trend of the Internet was smaller than SMS and Call.

**Figure 2.** Daily traffic change statistics.

#### *2.2. Spatial Domain*

Figure 3 shows the spatial distribution of Call traffic over a certain period of time. We can easily find that the traffic distribution of the whole city was very uneven. Intensive traffic was concentrated in the downtown area, while urban suburbs had very sparse traffic distributions. Moreover, it can be clearly seen from Figure 3 that a few areas covered traffic much higher than other areas. These areas were usually the bustling areas of the city and were often the most burdensome areas for wireless networks, and accurate predictions of traffic in these areas are important. However, these areas carrying large amounts of traffic also pose great difficulties for traffic prediction modeling. The existence of such singular values was very detrimental to the fitting of the model. Moreover, the trend of traffic fluctuations in these areas was often much larger than in other areas, and more research is needed to capture its space-time characteristics well.

**Figure 3.** Spatial distribution of cellular network traffic.

The correlation between traffic changes in different cells cannot be seen intuitively through Figure 3. To better show the spatial correlation of cellular traffic, we extracted 11 × 11 cells in the Call dataset for correlation analysis. Figure 4 shows the Pearson correlation coefficient *ρ* for the spatial correlation between target cell *x*(*i*,*j*) and its neighboring cells *x*(*<sup>i</sup>* ,*j* ). The Pearson correlation coefficient is a widely used metric [16,17]. Its definition is expressed as follows:

$$\rho = \frac{\text{cov}\left(\mathbf{x}^{(i,j)}, \mathbf{x}^{(i',j')}\right)}{\sigma\_{\mathbf{x}^{(i,j)}}\sigma\_{\mathbf{x}^{(i',j')}}}\tag{2}$$

where *cov* represents the covariance operator and *σ* represents the standard deviation. Figure 4 shows that the spatial correlation between different urban areas depended not only on the their proximity, but also on many external factors. For example, Cell (3,4) was the same distance from cell (4,3) to target cell (5,5), but the correlation coefficient was very different (0.35 and 0.96). Generally, the change in traffic was not necessarily highly related to neighboring cells and may also be strongly related to non-adjacent cells. However, the traditional convolution can only extract the information of neighboring cells. Therefore, we needed to find new ways to get potentially relevant information.

**Figure 4.** Spatial correlation analysis.

#### **3. Cellular Traffic Prediction Model**

#### *3.1. Model Framework Introduction*

This section mainly introduces our proposed hybrid spatiotemporal network HSTNet, which is mainly based on STDenseNet. HSTNet consists of two input sections and three module sections. The three modules include the convolution module, the time-embedding module, and the attention module. The predicted value *P*(*h*,*w*) of the model was combined with the output data of the three modules. The model framework is shown in Figure 5.

The traffic of each time period in the datasets was counted by 100 × 100 cells; that is, the traffic data of each time period could be composed of a 100 × 100 traffic distribution matrix. Therefore, we could effectively extract the spatial correlation of cellular traffic through convolutional neural networks. The main prediction module of our model was also implemented by the improved solution of the convolutional neural network (DenseNet) [19]. The historical data included the last three time periods and the current time period of the previous three days. We considered that cellular traffic was not only highly correlated with the traffic distribution of the previous few hours, but also depended on the traffic distribution at the current moment of the previous few days. Therefore, we entered the two pieces of historical data into the model separately. For example, if we forecast the traffic at 12 o'clock on the 11th of a certain month, the input data were the traffic data of 12 h on 8/9/10 of the month and the traffic data of 9/10/11 o'clock on the day. After the historical data were entered

into the model, they were processed by two modules: the convolution module and the attention module. The convolution module consisted of two DenseNets with deformable convolutions [20] that handled the historical data of the two parts. The output matrix was fused through a matrix of learnable parameters.

In the second section, we analyzed the time correlation of cellular traffic data. There were strong correlations with cellular traffic at different times of the day and whether it was a holiday. The input data of the time embedding module included the hour value of the predicted time and the attribute of the date (working day or holiday). The matrix generated by the time embedding module would be added to the matrix output by the convolution module.

The attention module received the input of two historical data items and performed integration operations. Then, a weight matrix based on historical data was output. The matrix adjusted the weight of the output of the first two modules and output *O*(*h*,*w*). Finally, the final prediction matrix *P*(*h*,*w*) was output through the sigmoid function and compared with the real value *P*(*h*,*w*).

**Figure 5.** The hybrid spatiotemporal network's (HSTNet) framework structure.

#### *3.2. Convolution Module*

In recent years, the improvement of convolutional neural network performance has been mainly divided into two major areas. One area is depth, such as ResNet, which solves the problem of gradient disappearance when the network is too deep [21]. The other area is the width, such as the Inception network, which uses a multi-scale convolution kernel to extend the model's generalization capabilities. In STDenseNet, the prediction module consists of two densely concatenated convolutional networks with the same structure. Similar to ResNet, DenseNet also establishes a dense connection between the front and back layers; that is, each layer accepts the output of all the previous layers as input and implements feature reuse in turn. DenseNet relies on the ultimate use of the network architecture to achieve fewer parameters and leading performance compared to traditional models [19].

Cellular traffic has a strong spatiotemporal autocorrelation. This work uses DenseNet to extract the spatiotemporal features of historical data, which can achieve better results than the traditional single channel convolutional neural network. The convolution module in HSTNet contains two separate DenseNets for processing two sets of historical data. Each DenseNet consists of three layers, each consisting of a unit block we call the DenseBlock. The outputs of the two DenseNets are multiplied by the learnable parameter matrix and then added.

#### *3.3. Deformable Convolution*

In recent years, convolutional neural networks have achieved excellent performance in many image fields with their good feature extraction ability and end-to-end learning. The convolution in the network samples different regions of the input image and then convolves the sampled information as an output. This convolution operation determines that the geometric deformation capability of the model does not come from the network, but from the diversity of the dataset. For example, assume that *Q* represents the receptive field area covered by the convolution kernel, in a 3 × 3 convolution kernel, *Q* = (1,1), (1,0), ..., (0,1), (1,1). For any pixel point *P*<sup>0</sup> on the feature map, the standard convolution method is as follows:

$$\mathcal{Y}\left(P\_0\right) = \sum\_{P\_n \in Q} \mathcal{W}\left(P\_n\right) \cdot \ge \left(P\_0 + P\_n\right) \tag{3}$$

Because the ordinary convolution method has limited adaptability to the complex spatial correlation of cellular network traffic, we introduced a deformable convolution to the model. The process of deformable convolution is to add an offset variable Δ*Pn* at each sampling point position. Δ*Pn* can be continuously learned and adaptively changed according to the current image content. This means that the convolution kernel is not limited to a fixed position sampling method, but can search for the region of interest near the current position for sampling. Thus, the convolution kernel improves the feature extraction capability for complex spaces. The following is the calculation process of the deformable convolution:

$$\log\left(P\_0\right) = \sum\_{P\_n \in Q} \mathcal{W}\left(P\_n\right) \cdot \text{x}\left(P\_0 + P\_n + \Delta P\_n\right) \tag{4}$$

The traditional convolution window only needs to train the pixel weight parameters of each convolution window. The deformable convolutional network must add some parameters to train the shape of the convolution window, that is the offset vector of each pixel. The offset field in Figure 6 is the additional parameter to be trained, and its size is the same as the input picture size. The convolution window slides on the offset field to achieve the effect of convolution pixel offset and optimize the sampling points.

**Figure 6.** Illustration of a 3 × 3 deformable convolution.

It can be seen from the above analysis that if Δ*Pn* = 0, the deformable convolution becomes a normal convolution, and there is no improvement in performance. In the process of predicting cellular network traffic, an ordinary convolution can only extract features for a fixed size range. The deformable convolution can extend the feature extraction range to more effective areas around by learning the offset variable Δ*Pn*. In this training process, the model can be offset from the area calculated by the common convolution kernel to other areas with more correlation and effectively avoids interference from uncorrelated spatial features, thus improving the predictive performance of cellular traffic.

In summary, to improve the spatial feature extraction ability of the model, a deformable convolution unit is added to each layer (DenseBlock) in the DenseNet. As shown in Figure 7, the original DenseBlock consists of a batch normalization layer, a rectified linear units (ReLU) layer, and a 3 × 3 convolution layer. The improvement of this paper is to change the 3 × 3 convolution layer to a 1 × 1 size, which is used to shape the data of the current DenseBlock, then access the batch normalization(BN) layer, the ReLU layer, and the deformable convolution layer in turn.

Since the traditional DenseBlock reuses the features of all previous layers, the deeper DenseBlock requires more parameters. Adding a 1 × 1 convolution layer can integrate input features into low dimensions and reduce the number of model parameters. We replaced the improved DenseBlock in the three-tier DenseNet, and the model parameters dropped from 230 thousand to 170 thousand. Replacing traditional convolutions with deformable convolution can expand the perceived range of convolution and improve the feature extraction capabilities of spatial features.

**Figure 7.** Improved structure of DenseBlock.

#### *3.4. Time Embedding Module*

In the previous data analysis, we can see that time had a strong correlation with communication traffic. To capture the temporal characteristics of the data for better model prediction, we collected Italian holiday information and introduced hours and holidays as external features in the model input. The specific process was as follows.


The above two vectors were combined into a 25-dimensional vector *T*. For example, if the predicted time point is 12:00:00 11/01/2013 (Italian holiday), the two time data *Is\_of\_Holiday* (1) and *Hour\_of\_Day* (a 24-dimensional one-hot vector, its 12th bit is 1) are extracted to form a time feature vector *T*. The feature vector *T* is input to the two layer fully connected layer, and the output is an *H* × *W* dimensional vector *vtime*. The vector is reshaped into a matrix *Mtime* of *H* × *W* in size through a reshape layer and merged with the input result of the prediction branch. The reshape layer receives a vector of length *H* × *W* and shapes it into a matrix of *H* × *W*. The calculation process is as follows:

$$
\sigma\_{\text{time}} = \sigma \left( \mathcal{W}\_{\text{time}}^2 \sigma \left( \mathcal{W}\_{\text{time}}^1 T + b\_{\text{time}}^1 \right) + b\_{\text{time}}^2 \right) \tag{5}
$$

where *W<sup>i</sup>* time and *<sup>b</sup><sup>i</sup>* time are the learnable parameters of the ith fully connected layer. *<sup>σ</sup>* represents the sigmoid activation function.

$$M\_{\text{time}} = \text{Reshape}\left(v\_{\text{time}}\right) \tag{6}$$

where *<sup>v</sup>*time ∈ *<sup>R</sup>HW*×<sup>1</sup> and *Mtime* ∈ *<sup>R</sup>H*×*W*.

#### *3.5. Attention Module*

The human brain receives considerable external input information at every moment. When the human brain receives this information, it consciously or unconsciously uses the attention mechanism to obtain more important information. At present, this attention mechanism has been introduced into the fields of natural language processing, object detection, semantic segmentation, etc., and has achieved good results. In our work, an attention mechanism was added to the network as a module to optimize the density map generated by the prediction.

The statistical value of the most densely populated area is often much larger than the value of most other areas in the traffic dataset. For example, in a traffic distribution matrix, the average is 30, but the maximum is 4000. This is very disadvantageous for accurate prediction of images by convolutional neural networks. Therefore, to solve the problem that the value gap between different regions is too large in the model prediction process and improve the overall prediction performance of the model, this paper proposes a weight adjustment scheme based on an attention mechanism. The density map traffic density distribution has a strong correlation with the corresponding historical data. Therefore, we integrated the corresponding historical data as input, merged them into a two-column attention matrix through a 1 <sup>×</sup> 1 convolution kernel, and normalized it to form a weight matrix *<sup>W</sup>*(*h*,*w*). Then, *W*(*h*,*w*) multiplies the matrix *M*(*h*,*w*) generated by the prediction branch to generate a prediction matrix *O*(*h*,*w*) for adjusting the weights. The calculation is as follows:

$$
\mathcal{O}^{(h,w)} = \mathcal{W}^{(h,w)} \cdot \mathcal{M}^{(h,w)} \tag{7}
$$

In this way, higher weights can be obtained for pixels with relatively higher traffic density in historical data, and lower weights can be obtained for pixels with relatively lower density. In the case, weights are differentiated from the density map generated by the prediction module. The operation can improve the quality of the final density map.

#### **4. Experimental Results and Analysis**

#### *4.1. Experimental Process and Parameter Setting*

The experimental dataset was from Telecom Italia, and we used the same pre-processing method as [17] to aggregate data from the 10 min interval in the original dataset to hours. Because the 10 min interval dataset was quite sparse, it was not conducive to extracting spatiotemporal characteristics. The difference was that [17] separately predicted the receive and send dimensions in SMS and Call. We combined the receive and send dimensions and used the total traffic input model for prediction. All data were normalized before inputting in the model, which allowed the model to converge faster and improved the computational efficiency of the fitting process.

HSTNet uses an optimization algorithm, Adam [22], which can replace the traditional stochastic gradient descent algorithm, which iteratively updates the neural network weights based on the training data. The experiment was carried out in three datasets with a learning rate of 0.01 and a training of 150 epochs. The learning rate decayed as the epoch of training increased. Our model was tested on three datasets: SMS, Call, and Internet. Each dataset contained 1488 (62 days × 24 h) slices. Except for the first three days without sufficient historical data, we used 52 days of data (1248 pieces) from 4 November 2013 to 24 December 2013 as the training set and used data from 25 December 2013 to 1 January 2014 (168 pieces) as the test set. In the convolution module, the deformable convolutional layer had 32 filters. The remaining convolutional layers had 16 filters with a kernel size of 1 × 1. The normal convolutional layer in the attention module had one filter with a kernel size of 1 × 1. The activation function of the convolutional layers was ReLU, except for the last layer, which used the sigmoid activation function. The code of HSTNet was implemented under Python 3.7, Keras 2.1.6, and NumPy 1.15.4. The experimental hardware environment included AMD R5 2600, GTX 1070, and 16 G memory.

Our experiments compared HSTNet performance with baseline algorithms such as the historical average (HA), ARIMA, LSTM, and STDenseNet. In the experiment process, the deformable convolution, time embedding module, and attention module were embedded in STDenseNet to observe the performance improvement of the model, and HSTNet had all the above improvements added. The generated prediction map was re-adjusted to the normal scale and then evaluated with the true value.

We used the two indicators of mean absolute error (MAE) and root mean squared error (RMSE) to evaluate the model. MAE is the average of the absolute error, which can better reflect the actual situation of the predicted value error.

$$MAE = \frac{\sum\_{h=1}^{H} \sum\_{W=1}^{W} \left| p^{\prime (h,w)} - p^{(h,w)} \right|}{H \times W} \tag{8}$$

RMSE represents the square root of the second sample moment of the differences between predicted values and observed values or the quadratic mean of these differences. RMSE is more sensitive to outliers.

$$RMSE = \sqrt{\frac{\sum\_{h=1}^{H} \sum\_{W=1}^{W} \left(p^{\prime (h,w)} - p^{(h,w)}\right)^2}{H \times W}} \tag{9}$$

#### *4.2. Experiment Analysis*

To compare the performance of HSTNet proposed in this paper, we selected three existing traffic prediction algorithms as the baseline of this experiment: historical average (HA), ARIMA, LSTM, and STDenseNet. Four methods performed MAE- and RMSE-based evaluations on three datasets. The result is shown in Figure 8.

From Figure 8, we can see that HSTNet's MSE and RMSE performance in the three traffic datasets was ahead of the other existing algorithms. The historical average was simply calculated from historical data and lacked the mining of deep correlations. ARIMA only considered the historical timing characteristics of the data, without regard for other dependencies. The performance of LSTM was better than statistical methods, but worse than other deep learning methods. STDenseNet did not consider the impact of external factors. Our model not only better extracted the spatial correlation of traffic data, but also considered the impact of time attributes on traffic changes, so the best performance was achieved.

**Figure 8.** Comparison of prediction performance on the baseline and HSTNet. HA, historical average.

In Table 1, we calculate the model evaluation results after adding the deformable convolution, time embedding module, and attention module in STDenseNet. In Table 1, +DeformConv represents embedding only deformable convolutions in the baseline. HSTNet was evaluated after incorporating three improvements. The model achieved the best MAE and RMSE performance on Call, and the effect on SMS was slightly worse. The effect of the model on Internet improved, but the overall performance was far worse than Call and SMS because the traffic on Internet was very different from the SMS and the Call traffic. In some cases, it was close to ten times the gap. Considerable traffic changes had a large impact on model performance.

It is worth mentioning that the traffic gap between different cells after integration was about twice that of the original, which was more complicated than the separate prediction. Therefore, compared to the results in [17], our experiments obtained larger RMSE and MAE on SMS and Call.


**Table 1.** Overall performance of the model.

Figure 9 shows the performance improvement of the model by adding different modules and HSTNet. For the SMS datasets, the addition of deformable convolution, time embedding, and attention modules increased by 2.61%, 3.96%, and 9.09%, respectively, the RMSE. For the MAE, there were 2.11%, 0.98%, and 3.16% performance improvements. For the Call dataset, the addition of deformable convolution units had an increase in 6.39% and 5.38% in MAE and RMSE, respectively; the attention module had a 10.57% improvement in MAE. For the Internet dataset, the three improvements we proposed still improved the results to varying degrees. The effect of adding the time attribute and the attention module was very obvious and had an approximately 10% improvement on the MAE.

It can be seen that the three different improvements improved the results in the different datasets, verifying the correctness of the hypotheses in Section 3. Different improvements had different effects on the three datasets. The attention module was very impressive for predicting the performance of all datasets. Deformable convolution and time embedding modules also had varying degrees of performance improvement for individual datasets. Overall, the performance improvement created by the attention mechanism was greater than the other two improvements.

**Figure 9.** Comparison of different module effects based on STDenseNet.

Embedding different modules had different effects on the model's runtime and parameters. Table 2 shows the changes in the time to train an epoch and the number of parameters under different conditions. We added each module individually to observe the results. Although the time embedding module and the attention module had added parameters, they had little effect on the running time. The addition of deformable convolution reduced the parameters by about 69K, but complex calculations increased the running time by nearly half. There was no difference in the running cost of the three datasets due to the same structure.

Considering the performance and cost of HSTNet, we think that the time embedding module and attention module were the most effective strategies for improving model performance. Deformable convolution could improve prediction performance to a certain extent, but it also increased the running time of the model. It should be mentioned that DenseBlock with more filters could obtain better performance, but the training time was greatly increased. For example, if we replaced the original 32 filters with the 64 filters, we could increase the above results of RMSE by 2.3%, but nearly three times the training time in SMS.


**Table 2.** The effects on parameters and the time of each epoch.

In Section 3.1, we mentioned that the input data included the last three time periods and the current time period of the previous three days. We input different time dimensions to analyze the impact on HSTNet's performance. The *N*-dimensional data represented the data from the last *N* time periods and the current time period of the previous *N* days. As shown in Table 3, different time dimensions had a certain impact on the RMSE results. In the three datasets, HSTNet achieved the best performance when inputting three-dimensional data. The performance was slightly poorer when only one- or two-dimensional data were input, which indicated that the model could not extract the spatiotemporal characteristics of the traffic well in this case. When *N* = 4, the performance of the model would also decrease. We believe that this was caused by the introduction of more low correlation data.


**Table 3.** RMSE results of different input dimensions.

Figure 10 shows the cell of the Internet dataset (55,58), the predicted values of the five methods compared to the real values. To visually show the difference in performance, we compared their accumulation error in the lower right subgraph of Figure 10. The predicted values of HA and ARIMA had large errors with the true values, and they lacked accuracy in fitting the peaks. The method based on deep learning was much better than the traditional method. HSTNet's overall fitting effect on traffic was higher than the other three. Especially during peak hours of network traffic, HSTNet achieved more accurate predictions. Compared to STDenseNet, its overall error in this area decreased by nearly 10% because HSTNet had better spatiotemporal feature extraction capabilities.

**Figure 10.** Comparison of the baseline and HSTNet on cell (50,58).

#### *4.3. Experimental Result*

In this part, we use HSTNet to predict the dataset and analyze the results. To verify the predictive performance of our model, we compared the predicted and actual values of the city's total traffic over one week and performed error analysis.

As shown in Figure 11, the X-axis represents the time interval in hours, and the Y-axis represents the total traffic for the entire city at the current time. Even for the difficult task of predicting overall urban traffic, the model still had a good fitting effect. A sharp rise in traffic was observed at 145 h in the figure due to the large error caused by the 2013 New Year's Eve event. The ability of the model to fit the traffic fluctuations caused by unexpected events needs to be improved.

HSTNet had the best prediction effect on the Call dataset. From the top right subgraph of Figure 11, it can be seen that the error value could be controlled within a certain range except for the impact caused by the unexpected event. The prediction effect of Internet traffic at night was not very satisfactory. There was a significant error in traffic prediction during this period, as shown in the lower right subgraph, because the three types of traffic had different scales of change. The Internet's day-night traffic gap was approximately two million (SMS and Call were only approximately 500 k), which created considerable difficulties in forecasting.

HSTNet also had good predictive performance for traffic in different areas of the city. Figure 12 shows a comparison of the predicted and real images at 10 o'clock on 24 December 2013. Each image had 100 × 100 cells. The brightness of the cell pixels indicates the traffic load of the corresponding cell. The brighter area indicates greater traffic load. This proved that our model could accurately predict the regions with different traffic distributions and could extract the spatial correlation of urban cellular traffic.

**Figure 11.** Comparison of hourly traffic prediction with the ground truth.

**Figure 12.** Comparison of predicted and real images.

The last line in Figure 12 shows the error between the corresponding prediction map and the ground truth. The brighter area indicates a greater error. We can see that the prediction error was not only biased towards the downtown area where the traffic load was large, but also, there were large errors in many suburbs with heavy traffic loads. This shows that the change in cellular traffic also depended on many factors that we had not considered, and a more sophisticated model is needed in the future to analyze the characteristics of cellular traffic changes.

#### **5. Conclusions**

This paper was devoted to the prediction of cellular network traffic. We conducted in-depth research on the spatiotemporal correlation of cellular networks and analyzed various factors that affect traffic changes, then proposed a hybrid deep learning model for traffic spatiotemporal prediction:


The experimental results showed that compared with the existing methods, the hybrid spatiotemporal network HSTNet proposed in this paper could better extract the spatiotemporal characteristics of image traffic data and improve the prediction accuracy, thus making more effective traffic prediction.

There are still many aspects that need improvement in our work:


In future work, traffic prediction modeling needs to consider not only the use of more sophisticated networks to extract features, but also the analysis and introduction of external data in multiple dimensions. It is worth mentioning that the introduction of our cellular traffic prediction scheme into traffic prediction problems in other similar contexts is also a very worthwhile research direction.

**Author Contributions:** Conceptualization, D.Z.; data curation, B.Y.; formal analysis, L.L.; funding acquisition, D.Z.; writing, original draft, D.Z. and L.L.; writing, review and editing, C.X. and Q.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Natural Science Foundation of China Grant Numbers 61402397, 61263043, 61562093, and 61663046.

**Acknowledgments:** This work is supported by: (i) the Natural Science Foundation China (NSFC) under Grant Nos. 61402397, 61263043, 61562093, and 61663046; (ii) Yunnan Provincial Young academic and technical leaders reserve talents under Grant No. 2017HB005; (iii) the Yunnan Provincial Innovation Team under Grant No. 2017HC012; and (iv) the Youth Talents Project of the China Association of Science and Technology under Grant No. W8193209.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Algorithms* Editorial Office E-mail: algorithms@mdpi.com www.mdpi.com/journal/algorithms

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18