1. Introduction
Nowadays, there is a great demand for internet application services, such as video [
1] and audio streaming, Voice-over-IP (VoIP) [
2,
3], online games [
4] among others. Multimedia services represent more than 50% of current Internet traffic [
5]. VoIP service is one of the most popular communication services due to the low phone call rate compared to conventional telephony [
6], but also due to the high speech quality level achieved in recent years [
7]. Thus, network providers need to perform monitoring and operation tasks to ensure an acceptable end-user’s Quality-of-Experience (QoE).
In ad-hoc wireless networks, to ensure a reliable network performance is a great challenge due to the characteristic of this kind of network [
8]. Dynamic topology, shared wireless channels, and limited node capabilities are factors that need to be considered in order to provide a high quality VoIP service. For instance, device batteries are limited resources that can lead to link losses connected to that nodes during power failures [
9].
In a VoIP communication, end-user’s QoE is determined by the user’s perception [
10,
11,
12]. In general, speech quality assessment methods can be divided in subjective and objective methods. Subjective methods are performed in a laboratory environment using a standardized test procedure [
13]. Several listeners score an audio sample and the average value is computed and named Mean Opinion Score (MOS). However, subjective methods are time-consuming and expensive [
14]. Another manner to predict the quality of a VoIP call is through a parametric method, such as the E-model algorithms [
15,
16] which provides a conversation quality index estimated through different parameters related to acoustic environment, speech codec characteristics and network performance parameters.
Several factors, such as channel transmission capacity, node processing capacity, and routing protocols affect network performance parameters [
17].
Conventional routing protocols in ad-hoc networks, such as Optimized Link State Routing (OLSR), are unable to learn from abnormal network events that occurred several times in the past [
18]; then, those protocols can choose a path that in the past had recurrent problems. For example, let us consider a path
P where a given node
N presents recurrent shut downs due to either device failures or programmed power-offs to save energy [
19]. If a conventional protocol chooses this path
P, network degradation can occur, such as packet losses [
20]. A routing protocol that is able to learn from previous network failure events could avoid this path improving the network performance. Hence, there is a need for protocols capable to learn from network data history. Therefore, it is important that routing protocols use strategies that make them learn from past experiences to choose optimal routing paths [
21].
In the latest decades, Machine Learning algorithms have come to be used in several applications [
22,
23,
24,
25,
26,
27,
28]. Thus, these algorithms can be applied into routing control protocols [
29,
30,
31], specifically Reinforcement Learning (RL) is increasingly being used to solve routing problems [
32,
33,
34]. In RL, an agent must be able to learn how to behave in a dynamic environment through iterations [
35]. For instance, an agent who makes a choice receives a reward or a punishment whether the choice was good or bad, respectively. Hence, the RL technique can improve the steps along the decision making of path choice process, leading to better network performance, and consequently improved applications services, such as a VoIP communication [
36].
In Reference [
37], the authors introduce a generic model based on RL for ad-hoc networks focusing on routing strategies. Some works use RL for routing in urban vehicular ad-hoc networks (VANETs) [
32]. Other works focus on wireless sensor networks and their characteristics [
38] or unmanned robotic systems [
39].
In Reference [
18], an intelligent traffic control through deep learning is proposed, whose results demonstrated a performance gain compared to the traditional Open Shortest Path First (OSPF) routing protocol. In Reference [
21], author uses Deep Reinforcement Learning to develop a new general purpose protocol, and obtained superior results compared to OSPF. However, both works do not focus on ad-hoc networks, and they do not compare the algorithm developed with ad-hoc network protocols. In Reference [
40], a Reinforcement Learning Routing Protocol (RLRP) is proposed, which can be applied to ad-hoc networks.
Routing protocols require the use of control messages for their operation, they are responsible for the discovery of routes, for the dissemination of information on topology, among other things. However, control messages generate overhead on the network, thus decreasing network capacity especially in situations where the transmission channel may suffer interference or be saturated.
The use of RL technique in routing protocols may require an extra header, new control messages, or increasing the sending frequency of these messages. There are studies that aim to reduce overhead in traditional protocols. In protocols that use RL, a mechanism that provides the reduction of this overhead is relevant, because these routing techniques generate additional overhead.
In RL, there is an agent that interacts with an environment through the choice of actions [
35]. In RL, each action generates a reward that generally defines whether the action taken was good or bad. In Reference [
40], the rewards are sent to the nodes through control messages using a reward header that generates an overhead due to the use of RL. This additional overhead impacts on the global network performance.
In this context, there are research initiatives focused on decreasing the overhead originated by control messages. In Reference [
41], authors propose an adjustment in the interval for sending hello messages of the AODV protocol in a Flying Ad-Hoc Networks (FANETs) scenario, focusing on reducing the energy consumption of unmanned aerial vehicles (UAVs) by reducing the frequency of sending the hello message.
The results show a reduction in energy consumption without loss of network performance. Despite presenting relevant results, the work focuses on FANETs and their specific characteristics. In Reference [
42] the authors propose three algorithms to adjust the time to send Hello messages. The first algorithm is called Reactive Hello, where Hello messages are only sent when the node wants to send some packet. In other words, the discovery of the neighborhood is done only when the node wants to send a packet. Despite reducing overhead once the number of messages is reduced, this approach can degrade the network if its mobility is high, since the changes will only be noticed when the node needs to send a packet. The second method is called Event-Based Hello and the adjustment is made based on the events that occur in the network. In this approach, at first a network node sends Hello messages with the default frequency, but if after a predefined period of time that node does not receive any Hello messages from a neighbor or does not need to send packets it stops sending Hello messages. The problem with this approach is that if all the nodes in the network move away and after the time period return to get closer, no one will send Hello messages and the topology information would be out of step until a node decides to send a packet with the same problem as the Reactive Hello approach. In the third method, called Adaptive Hello, each node in the network sends a Hello after moving a defined distance. The problem with this algorithm is that each node needs to assume its position. In Reference [
43], the frequency depends on the speed of the nodes, and the problem of this approach is when there are nodes that do not move but disconnect, for example, to save energy.
The works previously mentioned demonstrate that a dynamic adjustment reduces overhead in relation to the simplistic model where the frequency of sending messages is defined in a static manner using fixed values. In this context, the goal is that the algorithm adjusts the sending of Hello messages according to the mobility of the network. The mobility occurs when a node moves out of the reach of neighbors, shuts down or it is inoperative. In case of mobility events, the frequency is adjusted to higher values so that the new network information can converge quickly. If there is no mobility on the network, the frequency should be reduced but not suspended as proposed in other works.
In this context, the main contributions of this paper can be summarized as follows:
To develop an enhanced routing protocol based on RL technique, named e-RLRP, that is able to learn from network events history, avoiding paths with connection problems. Also, it is able to reduce the number of control messages. The routing algorithm based on RL is developed according to Reference [
40].
Implementation of an algorithm that compensates the overhead inserted by the messages related to RL algorithm in the RLRP. To the best of our knowledge, a dynamic adjustment algorithm of Hello Message time interval to compensate the overhead has not been treated by other routing protocols based on RL. Thus, the present research contributes with the advances in the state-of-the-art of these protocol types.
The performance of the proposed method is compared to other widely used routing protocols, such as the Better Approach To Mobile Ad-hoc Networking (BATMAN) and Optimized Link State Routing (OLSR), and also the RLRP protocol. To this end, different network typologies and traffic flows were implemented. The performance comparison considers key network parameters, such as throughput, packet loss rate and delay. Also, the speech perceptual quality in a VoIP communication service is evaluated, in which two operation modes of the AMR-WB speech codec [
44] are used.
The algorithm to compensate for the overhead caused by the use of RL is based on the reduction of the overhead generated by another control message, the Hello message, which is responsible for the dissemination of information about the neighborhood of each network node. A dynamic adjustment in the frequency of sending the Hello message is capable of reducing the global overhead. The algorithm proposed in this work adjusts the sending of Hello messages according to the mobility of the network. Thus, this work contributes in the improvement of routing protocols based on RL technique, because it addresses one of the deficiencies of these protocols, which is the increasing number of control messages; therefore, the network overhead is also affected.
In this work, different ad-hoc multihop network scenarios are implemented, considering different network topologies, a variable number of nodes, different traffic flows and several degrees of network mobility. In order to simulate network failures, some nodes drop in random instants during each simulation. In these scenarios, a VoIP traffic is simulated and used as a case study. To this end, an UDP traffic is defined between a pair of source and destination nodes, and some nodes in the network are randomly turned off in order to simulate a network failure. Thus, it is possible to obtain network parameters, such as throughput, delays, packet loss rate and number of control message sent to the network, which are used to evaluate the impact of the routing algorithm on the perceptual quality of VoIP communication according to the E-model algorithm described in ITU-T recommendation G.107.1 [
16]. It is important to note that VoIP service is used as a specific study case, but the proposed routing algorithm is for general purposes being agnostic of the service application. Finally, experimental performance results show that the proposed e-RLP overcame, in most of the test scenarios used in this work, the other routing protocols used for comparison purposes. The e-RLRP provides an overhead reduction of up to 18% compared to RLRP. The case study demonstrates that e-RLRP can provide a VoIP communication quality improvement of more than 90% if compared to OLSR, and up to 8% if compared to RLRP.
The remainder of this paper is structured as follows. In the
Section 2 a theoretical review is presented. The proposed routing algorithm based on RL is described in
Section 3. In
Section 4, the different steps of the experimental setup are described.
Section 5 presents the experimental results. Finally, the conclusions are presented in
Section 6.
3. The Proposed e-RLRP Algorithm
In this section, the proposed e-RLRP algorithm is explained. Firstly, the RL technique in the routing protocol is implemented according to Reference [
40]. Later, the proposed method to reduce the overhead is detailed.
3.1. Reinforcement Learning Used in Routing Protocol
The reward propagation with Acknowledgment message, the reward generation and the estimation values are presented.
3.1.1. Reward Propagation with Acknowledgment Message (ACK)
The reward value is directly related to the receipt of the ACK. When a node wants to send a packet to a given destination it selects a neighbor from the existing one and sends the packet to that neighbor. After that, it waits for the corresponding ACK message, which contains meta-information about the received packet, and the reward value by the action of choosing this neighbor. This ACK message can return using a path different from the one used to send the corresponding packet.
If the ACK is not received within a pre-defined time then the sender node sets a punishment, that is, a negative reward to the neighboring node to which the packet was forwarded. This negative value is set to −1. If the ACK is not being received probably the neighboring node has gone offline. The neighboring may be experiencing hardware issues such as power outages, strong interference with wireless transmission or the node is overloaded with incoming traffic. Hence, it is consistent that this neighbor should be avoided in the future.
If the ACK message is received on time a reward value will be provided within the message. If the value is high it means that the neighbor has a good way to the destination, the probability of choosing this neighbor in the future will increase. If the value is low it means that the chosen neighbor does not have a good route to that destination, because it has hardware problems, there may be many hops or the further links quality is weak. In this case the source node will slowly decrease the estimation value for this neighbor, which is likely to cause the node to later choose other neighbors.
3.1.2. Reward Generation
The mechanism for adjusting the reward value must be flexible, that is, the adjustment may not be too small that do not cause changes or too large as to induce sudden change due to a specific events. For example, if the value of the punishment after choosing a bad route is too low, the estimated value of that route will slowly decrease and probably this bad route can still be chosen for a long time. On the other hand, if the punishment value is too high, a route may no longer be chosen because of just one packet loss event. Therefore, a balance must be found between low and high rewards/punishment.
According to Reference [
40], the reward value is calculated as follows: When a node X receives a packet of node Y, an ACK is sent with the reward value to Y. To calculate the reward value, the sum of the estimated values that each neighbor has in relation to destination node Y, called
is divided by the corresponding number of neighbors (N). Thus, the
is the average of the
Q values of the neighbors in relation to node Y. The
is calculated according to:
Upon receiving the , the node Y adjusts the estimated value for node X. However if the ACK is not received, the node Y automatically set the reward value to −1, that is, a punishment is generated that negatively impacts the estimated value for the route. The estimation value is defined in the next subsection.
3.1.3. Estimation Values Based on Rewards
An initial value must be set for each node when the protocol starts, which is often called cold start. The RLRP initially defines all neighbors with a value of 0 when a source node has no route information towards a destination node. The available range of estimated values is defined as: [0, 100]. When the protocol starts the route discovery process the estimate values are set as follows:
where
is the estimated value for destination IP towards neighbor n;
is the number of hops in which RREQ or RREP messages has traversed from the source to the destination node
After the path discovery procedure ends all nodes in the network have the initial estimated values for all routes. According to the calculation presented in (
10), the estimation value is initially defined based on the number of hops between the source and the destination. It can be defined that the RLRP uses an initial approach of the hop count metric, in which the routes with the least hop are chosen.
However, afterwards the values be adjusted since the route with the least number of hops is not always the best one. For, a route may have the least number of hops but present an overloaded link or have nodes that present malfunctions. The adjustment is made according to the received reward value. The estimation value
Q like described in
Section 2.1 is calculated as follows:
where
represents the new estimation value for the action;
is the actual estimate value;
define the reward value obtained;
represents step size parameter; and
k is the current step number. Therefore, the estimated value as stated above is impacted by the reward value.
In e-RLRP, the reward is associated with the successful delivery of packets. Then, in general, the local reward is given to the route that has the best rate of success in delivering packets, and the long-term reward is related to the global network performance by always looking for routes with the highest success rates. As explained in
Section 2.1, the RL algorithm has to consider two approaches in order to obtain a long-term reward, the selection of actions that obtain the highest reward values or explore new actions that can generate even better rewards. For this decision task, the e-RLRP uses the Softmax method [
60].
3.2. Algorithm Used in the e-RLRP to Reduce the Overhead
To send a packet, the node needs to know what neighbors nodes are directly connected. Therefore, a neighborhood discovery procedure is required. In RLRP, this procedure occurs through the broadcasting of messages called Hello.
By default, Hello messages are sent every 2 s, thus, the information about neighbors is updated in the same period of time. This update interval parameter is called Broadcast Interval (BI). The RLRP has 10 types of headers, two of them are Reward Header and Hello Header. The structure of data fields of the Reward Header and Hello Header are shown in
Table 3 and
Table 4, respectively.
The Reward Header is 8 bytes. The Type field defines what the header is, the ID field is the unique identifier of the message service. The Neg Reward Flag field is a test flag that checks whether the reward is negative or positive, the Reward Value is a value of reward, and finally, the Msg Hash is the identifier of the packet to which the reward belongs.
The Hello Header size ranges from 4 to 56 bytes, this variation depends on the node address that can be IPV4 or IPV6. The Type field defines what the header is, the field IPv4 Count defines the number of assigned IPv4 addresses, limited to one. The IPv6 Count is number of assigned IPv6 addresses, limited to three. Tx Count is the number of re-broadcasts, GW Mode define that a node is a Gateway in the network, the IPv4 and IPv6 address define the address of the node.
As can be observed in
Table 3, the Reward header used in RLRP is 8 bytes long and generates an additional overhead, which corresponds to the use of RL technique.
In this context, the present research implemented an algorithm to reduce the overhead generated by the Hello message, specifically to reduce the frequency of sending Hello messages in order to compensate the additional overhead generated by the Reward header.
It is clear that increasing the time interval for sending Hello messages, defined by the BI parameter, will decrease the frequency of sending messages and consequently, the overhead is also decreased. However, a high value also impacts the time of updating information about the neighborhood, and the routing can be negatively affected.
Thus, the proposed algorithm implemented in the e-RLRP is capable of dynamically adjusts the frequency of sending Hello messages. This adjustment in the parameter BI is made according to the mobility present in the network. If the network is static, that is, no neighbors enter or leave the coverage range, it is not necessary to send Hello messages with a high frequency. Otherwise, if the network presents a high mobility, to send messages more frequently is necessary.
A general high representation of the proposed algorithm is introduced in
Figure 2.
The sending of Hello messages starts together with the e-RLRP daemon. Next, the algorithm checks the mobility of the network. To this end, there is a function named Update Neighbors File responsible for updating the list of neighbors every time a Hello from a new node is received. And there is a other function named Check Expired Neighbors that checks if a Hello message has been received from neighbors every 7 s, if a neighbor is 7 s or more without sending a Hello, it is removed from the list because it is out of reach. This interval of time was defined experimentally in Reference [
40]. In case, a new neighbor is detected or an existing one is lost, it will be considered that there is mobility in the network.
In the proposed e-RLRP, when mobility occurs, the BI value will be reduced to a lower limit called BI Lower Limit (), the algorithm waits for a new time interval, sends a message and restarts the process. If a mobility event does not occur, the BI parameter will be increased with the Adjustment Factor (AF) parameter respecting the upper limit called BI Upper Limit (). When the time defined by BI is reached, a Hello message will be sent and the process is restarted. Hence, the frequency of sending Hello messages is adjusted according to the mobility of the network
It is worth mentioning that the proposed dynamic adjustment is not based on RL, because RL uses more computational resources.
3.2.1. Definition of the Upper and Lower Limits of Broadcast Interval
The higher the value of the BI parameter, the lower the frequency of sending Hello messages, and consequently the overhead is reduced. However, it is necessary to define a limit to that value does not grow indefinitely.
The cannot be greater or equal than 7 s due to the Check Expired Neighbors function. Otherwise, the network nodes will be eliminated when the function timeout will be reached. Considering that the value must be lower than 7 s and also the latency of the existing network, the value 6 s is defined in order to guarantee that neighbors are not erroneously removed.
To define the
, 3 values of
BI lower that 2 s are tested in the scenario called Programmed that is described in
Section 4.1. The overhead is calculated considering the source and destination node. The
BI value of 2 s, defined in the RLRP, is also tested in the same scenario, and the overhead obtained was 1.42 MB. The
BI values 0.5, 1.0 and 1.5 were tested.
Table 5 shows the overhead results for each
BI value.
Table 5 shows the increase in overhead of the tested values in relation to the default value used in RLRP. The
BI value of 1.5 had a gain of 2.12%, the value of 1.0 presented an increase of 4.22%. The value of 0.5 obtained the highest increase, a gain of approximately 12.67%. Considering this value as a high increase in overhead compared to the previous ones, the value of 0.5 is discarded. Hence, we opted for the intermediate tested value, and
is set to 1.0.
3.2.2. Adjustment Factor
The objective of the e-RLRP is to reduce overhead but without degrading the performance of the algorithm. Thus, after a scenario of high mobility is detected, the rise of the BI parameter should be slower to ensure that the upper limit is slowly reached, because there is a likelihood that the occurrence of mobility will repeat itself. In a scenario in which an isolated episode of mobility occurs, the climb should be a little faster. Therefore, the AF also should has responses according to the mobility of the network. It is important to note that in initial tests, we used fixed values for the frequency of Hello messages, and the results demonstrated that dynamic methods permit to obtain better results in terms of the network performance parameters used in this work.
To ensure that no sudden changes occur in the AF, a scale of ten positions is defined, in which the upper limit is called and the minimum value is called .
Also in the Programmed scenario described in the
Section 4.1, the convergence time (
CT) of the e-RLRP, which is defined as the time elapsed between breaking a route until the algorithm converged to find a new route, was also evaluated. Experimental test results demonstrated that the average of
CT is 20.6 s.
The AF value cannot be high to avoid be aggressive enough to BI parameter reach the before the CT. Then, to calculate the Arithmetic Progression (AP) or also known as arithmetic sequence is applied, with a difference between the consecutive terms equal to , where term is , is and the sum of the terms must not be greater than CT.
To ensure that value is not reached before 20.6 s, the value is rounded to 21 s and applying the formula of the sum of a AP, the
value is obtained.
Applying the result of Equation (
12) in the formula for the general term of a AP:
The value obtained for is 1, then, the maximum value of AF should be 1. As previously stated, a scale of 10 position was defined. Then, the value of is 0.1, and each position of that scale is increased by 0.1.
Whenever mobility occurs in the network the AF is decreased in the scale. The increase will occur when there is a tendency of decrease in mobility during a period of time.
This period of time called Time of Check (
TC) is defined by the average between
CT value and the time spent for the algorithm starting from
until reaching value
with adjustment
. To calculate
TC, first, the formula of the general term of AP is applied. The
is the common difference,
and
are the terms An and A1 respectively.
Applying the result of Equation (
14) in the formula for the sum of a AP and averaging:
The
TC value is 99.7, in this way after 99.7 s if there is a tendency to reduce mobility, the
AF will be increased. A mobility counter denominated
will be used to count how many mobility events occur in the
TC time period. Whether when a new neighbor comes within range of a given node or when a neighbor leaves within range of that node
Belonging to a family of statistical approaches used to analyze time series data in the area of finance and technical analysis [
61], the Exponential Moving Average (
EMA) can be used to estimate values [
61,
62,
63]. The
EMA is used to calculate if the occurrence of mobility tends to increase or decrease according to the Equation (
17). The
EMA is applied in a series of 10 values
.
If < , the number of mobility events has a tendency to decrease, then the AF value will be increased. The period N of 10 values was chosen precisely because it is the number of times that AF must be increased until reaching .
The scheme of the
AF adjustment algorithm is shown in
Figure 3.
Thus, the BI is adjusted according to the mobility of the network, making possible to reduce overhead.
4. Experimental Setup
In this section, different network scenario configurations used in the simulation tests for performance validation of the proposed e-RLRP are described. Different network topologies with different numbers of nodes, routes, traffic flows and network mobility conditions are considered. Firstly, the four network topologies used in the simulations are described. Later, two simulation scenarios are explained. Finally, the transmission rate in the scenarios are explained, and the simulation environment is described.
4.1. Network Topology
In this work, four network topologies were created to simulate a wireless node network, which are called T1, T2, T3 and T4. Node names were distributed in order to improve the understanding of the scenarios that will be described later. The topologies were developed in order to guarantee that each route has a different number of hops. In Topology T1, there are 3 routes and 8 nodes as illustrated in
Figure 4.
The Topology T2 is an extension of T1 with the addition of three nodes, thus, in total there are 11 nodes and four different routes, which are distributed according to
Figure 5.
The T3 is illustrated in
Figure 6. This topology is also an extension of the T1 but now we add five extra nodes; thus, there are a total of 13 nodes with 5 different routes.
The T4 is illustrated in
Figure 7. This topology like the others is an extension of the T1 but now we add eight extra nodes; thus, there are a total of 16 nodes with 6 different routes.
4.2. Emulation Scenario
In order to test the functionalities of the e-RLRP, two different scenarios were developed in which there are routes that degrade network performance. To this end, some nodes in the network were configured to disconnect on a recurring basis at random instants, simulating node failures and mobility in the network.
In the first scenario, only topology T1 is used. A flow is defined with node C being the source and E being the destination, the node D will be programmed to shut down 5 times. This node is part of the shortest route between the source and destination of traffic for T1. For a better later association, the first scenario is named Programmed (P). Thus, the scenario P is a proof of concept to test the RL in the e-RLRP, in which a better performance than other protocol is expected. In principle, the route with the least number of hops is the best path and is the one that should be chosen initially by all protocols. However, in this scenario, the choice of this path will cause degradation in the network since there is a node that recurrently disconnects causing packet loss. As the e-RLRP can learn from the network, it should be able to avoid the path containing nodes which present recurrent drops.
The second scenario, named Random (R), also a flow is defined with node C being the source and E being the destination. The nodes A, D and G of topology T1; nodes A, D, G and J of T2; nodes A, D, G, J and L of T3; and nodes A, D, G, J, N, and O of T4 are randomly disabled at different instants, in order to simulate random drops. In addition, 3 configurations for drops are defined. In the first configuration, 3 drops are drawn between the aforementioned nodes for each topology. In the second configuration, 5 drops are drawn, and in the third configuration 7 drops are considered. The reason for choosing only these nodes is to ensure that each route has only one node that fails, thus, the same probability to draw a drop for each route is ensured. These nodes are randomly disconnected in each simulation. The instants in which each node drops during the simulation is randomly defined, then, the routing algorithm does not know which node is down to avoid that path. The objective of this scenario is to test the e-RLRP in a random scenario when the network degradation increases. The network scenarios characteristics used in this research are different from network scenarios in which node drops are controlled, and a scheduler can be implemented in the network. It is important to note that the e-RLRP could also work in conjunction with a scheduler for more complex network scenarios, but these scenarios are out of the scope of this present research.
Additionally, two different configurations of the scenario R is defined for topologies T3 and T4 where the flow number is greater than one. A network configuration with 3 flows is defined, in which the first one is from node C to E, second one from node F to B and third one from node I to H. The second network configuration considers 4 flows, where an additional flow from node M to K is added to the three previous mentioned flows. The objective of these two scenarios, is to investigate the impact of the additional network overhead due to RL control messages.
In these both scenarios, the ability of e-RLRP to avoid routes that degrade the network through the use of RL is tested. And mainly the ability of e-RLRP to reduce network overhead in mobility scenarios is also evaluated, providing a higher throughput and reducing the Ppl value.
Table 6 shows which nodes have been configured to shut down simulating a drop in T1, T2, T3 and T4 topologies for all scenarios. In the Programmed Scenario uses only the T1 because it is a scenario for proof of concept.
4.3. Transmission Rates of AMR-WB Codec
This work also aims to test the impact of the previously mentioned routing protocols in a real communication service, to this end, VoIP communication scenario is used as a case study. Thus, a traffic from node C to E is simulated with different bit-rates defined according to the AMR-WB codec. In addition, we used UDP communication and a packet time-length of 20 ms.
Speech signal transmitted on an IP network is compressed by a speech codec, and them this payload must be packaged. For this, Real Time Protocol (RTP), the UDP and IP headers are inserted. The bit-rates presented in
Table 2 only refer to the payload, then, it is necessary to add the number of bits regarding the RTP (12 bytes), UDP (8 bytes) and IP (20 bytes) headers to obtain the transmission rate. For example, AMR-WB-Mode 2 (12.65 kbps) contains 253 bits that are sent every 20 ms, then if the 320 bits of headers are added, a total of 573 bits are sent in this same period of time, that represents a transmission rate of 28.65 kbps.
Table 7 shows the transmission rates used in the test scenarios.
4.4. Emulation Environment
To test and analyze the performance of the four protocols previously mentioned, we use the network emulator Common Open Research Emulator (CORE) [
64]. Developed by Boeing’s Research and Technology division, CORE is a real-time, open source, emulator. The CORE is chosen because it enables the use of real-world routing protocols and applications using Linux system virtualization. The e-RLRP code must be executed on a Linux platform. Each node in the emulator is a virtual machine with network interface and resources shared with the host machine. The e-RLRP, RLRP, BATMAN and OLSR routing protocols are installed to be used by network nodes.
The network performance metrics obtained in the tests were throughput,
Probability of Packet Loss (
Ppl), the Round Trip Time (RTT) and Overhead. The throughput and
Ppl values are calculated using Iperf tool [
65]. It is capable of generating UDP and TCP traffic streams at defined rates. To calculate RTT, the UDP stream is replaced by an ICMP stream generated by the native Linux PING command. The PING command itself returns the RTT value. The Overhead is measured using the WireShark [
66] tool. In addition to the aforementioned tools, the native Linux shell script is used to shutdown nodes on a programmed or random basis.
Finally, the speech quality of a VoIP communication is evaluated. To this end, the network parameters, such as Ppl and delay were used as inputs of the E-model algorithm to estimate the communication quality.
5. Results and Discussions
In order to evaluate the e-RLRP performance in relation to BATMAN, OLSR and RLRP protocols, different network scenarios were emulated. Each simulation scenario runs 50 times, and the average value for each scenario is computed. The simulation of each scenario takes 600 s.
In the test scenarios, the AMR-WB operation modes 2 and 8 were considered. Thus, the transmission bit-rates considered were those presented in
Table 7.
Firstly, an ideal scenario without drops is tested to assess the overhead reduction obtained by the e-RLRP in relation to RLRP. The
Table 8 shows the overhead in the network scenario without drops, these results represent the average overhead of the nodes, considering AMR-WB Modes 8 and 2.
As expected, the results obtained in the ideal scenario without drops demonstrate that e-RLRP obtained an overhead approximately 16% lower than RLRP. This result is due to the fact that the e-RLRP in a scenario without falls keeps the frequency of sending messages lower than the RLRP.
In a real ad-hoc network environment, nodes move or may fail, degrading the network performance. Therefore, the e-RLRP, RLRP, BATMAN and OLSR protocols are testing in scenario where mobility occurs. The throughput and
Ppl results, in the so-called scenario P, are illustrated in
Table 9 and
Table 10, respectively.
Results presented in
Table 9 and
Table 10 demonstrate that e-RLRP and RLRP have a better performance than BATMAN and OSLR. The e-RLRP and RLRP have a
Ppl value close to zero because they avoid the route containing node B that presents recurring drops. The value does not reach zero because when the routing starts, both protocols choose the route of node B which has the lowest number of hops, but after successive drops of node B, both protocols no longer consider the use of this route.
Differently, the OLSR protocol chooses the route which contains node B, because this is the path with the least number of hops. Despite the BATMAN protocol having obtained a higher Ppl than e-RLRP and RLRP, it presented a performance better than OLSR. This is due to the OGM messaging mechanism.
The overhead results for scenario P considering AMR-WB Modes 2 and 8 are shown in
Table 11.
The overhead results presented in
Table 11 show that the e-RLRP reduced the overhead in relation to the RLRP by approximately 7%, and also got better results than BATMAN and OLSR protocols. This happens because the e-RLRP reduced the frequency of sending Hello messages.
Similarly, the same network performance parameters are evaluated in R scenario. The Throughput,
Ppl when nodes are shut down 3 times are presented in
Table 12 and
Table 13, respectively.
As can be observed in
Table 12 and
Table 13, the OLSR had the worst performance considering
Ppl and Throughput, which is explained by the use of RL in e-RLRP and RLRP, and by the BATMAN OGM message mechanism. The e-RLRP reached similar throughput results to the other protocols, but in some scenarios, the
Ppl had a significant reduction with e-RLRP.
The overhead when nodes are shut down 3 times are presented in
Table 14. The results presented in the
Table 14 demonstrate that the overhead of e-RLRP is lower than that of RLRP, reaching in some scenarios a reduction close to 18%.
The Throughput,
Ppl and Overhead when nodes are shut down 5 times are presented in
Table 15,
Table 16 and
Table 17, respectively.
According to the results obtained in a Five Drops scenario, the e-RLRP and RLRP algorithm performed better than BATMAN and OLSR. Also, the e-RLRP obtained an overhead reduction and Ppl lower values in relation to RLRP.
Similarly, the Throughput,
Ppl and overhead, when nodes are shut down 7 times, are presented in
Table 18,
Table 19 and
Table 20, respectively.
According to the presented results, the scenario where 7 drops occurs, the e-RLRP obtains better performance in all cases compared to BATMAN and OLSR, and also it presents a better performance than RLRP in most of the scenarios.
In general, the performance gain in scenarios R was lower than in scenario P. This behavior is because, in the scenario P the drops are recurrent in only one route, which facilitates the learning process of the e-RLRP.
Figure 8 demonstrates the e-RLRP performance improvement in relation to the other protocols. The
Ppl values obtained is the average of the results obtained in the four topologies and both AMR-WB rates modes used in the tests.
From the results we can conclude that the performance of the e-RLRP in relation to the other three protocols increases when the number of drops increases. By increasing the number of drops, the performance of all algorithms degrades, however, in the e-RLRP and RLRP this degradation is lower.
Figure 9 shows the relationship between e-RLRP performance and the number of nodes in the network. The
Ppl values obtained are the average of the results obtained in the scenarios of 3, 5 and 7 drops and both AMR-WB rate modes used in the tests. It is important to note that the higher the number of nodes in the network, the higher the processing needed by the RL algorithm to determine the reward values. Despite the RL processing increases, the performance obtained by the e-RLRP, in terms of
Ppl, is superior in relation to the other routing protocols.
The Throughput,
Ppl and overhead, for three flows considering AMR-WB Modes 8, are presented in
Table 21,
Table 22 and
Table 23, respectively.
Similarly, the Throughput,
Ppl and overhead, for three flows, and considering AMR-WB Mode 2, are presented in
Table 24,
Table 25 and
Table 26, respectively.
Analyzing the scenario with 3 flows, it can be seen that the e-RLRP overcomes in most cases the other protocols considering Ppl and Troughput. Regarding overhead, the e-RLRP reached the best results in all the network scenarios.
The Throughput,
Ppl and overhead, for four flows, and considering AMR-WB Mode 8, are presented in
Table 27,
Table 28 and
Table 29, respectively.
Similarly, the Throughput,
Ppl and overhead, for four flows, and considering AMR-WB Mode 2, are presented in
Table 30,
Table 31 and
Table 32, respectively.
We can see from the results of the 4-flow scenarios that e-RLRP outperforms other protocols in most cases in terms of Throughput and Ppl. Regarding the overhead e-RLRP obtained the best results in all tested scenarios. In addition, it is worth mentioning that the overhead in general increases when there are more flows, however, specifically the overhead generated by the Hello control message is not so impacted. This is because Hello messages are exchanged regardless of the number of streams in the network.
In the results presented in
Table 27,
Table 28,
Table 30 and
Table 31, regarding Throughput and
Ppl, we can observe that one of the traffic flow (noted as MK) reached a
Ppl almost equal to O, because there is a direct route between the two pairs of nodes and no drops occurred in this path. The extra flows were added in order to overload the network.
Analyzing the results of the scenarios with more than one traffic flow, specifically three and four flows, it is possible to observe that the e-RLRP outperforms the other routing protocols in most of the network scenarios tested, in terms of Ppl and Throughput. Regarding overhead, we can see that e-RLRP continues overcoming the other protocols. The experimental results confirmed that e-RLRP obtained a lower overhead than the RLRP in most of the scenarios, even when the number of traffic flows, the number of routes or node drops were increased. Thus, these demonstrated that the proposed adjustment function worked properly in the task of overhead reduction.
Additionally, the RTT parameter values obtained in scenario P is presented in
Figure 10.
Figure 11 shows the average of the RTT values of the scenarios R with also a single flow. These results represent the average values of two AMR-WB mode, because there was not difference between them.
Analyzing the results presented in
Figure 10 and
Figure 11, it is observed that the e-RLRP and RLRP presented the highest RTT values. This can be justified because they are implemented in user space on Linux using a dynamic Python interpreter. According to Reference [
40], this implementation-type generates a great loss of performance mainly due to the high number of I/O operations that cause delays in the packet sending process. It is worth mentioning that this is a limitation generated by the language in which it was implemented and not by the code/project. Thus, the implementation of these both protocols had a restriction in this regard, that was reflected in RTT values obtained in the experimental tests. According to (
6) and (
7), delays in the network have a negative impact on speech quality predictions.
Finally, the speech communication quality was evaluated using (
4)–(
6) considering the
and RTT values found in test scenarios that consider a single traffic flow with the topologies T1, T2, T3 and T4 used in this work.
Figure 12 presents the
scores for scenario R with Three Drops,
Figure 13 presents the
scores for Five Drops and
Figure 14 with Seven Drops.
Figure 15 presents the
scores for scenario P.
As can be observed from
Figure 12,
Figure 13,
Figure 14 and
Figure 15, the use of e-RLRP promotes a gain of
score in relation to those obtained by the RLRP, BATMAN and OLSR protocols. In some cases the gain in relation to OLSR is more to 90%. In relation to BATMAN, in some cases the gain is approximately 33%. In relation to the RLRP, the gain approaches 8%.
Therefore, RL in routing protocols improves the user’s QoE in a speech communication service. The e-RLRP not only reduces overhead but also provides a positive impact in the quality of VoIP communication, mainly because the Ppl is decreased.
6. Conclusions
In this work, the experimental results demonstrate that a routing protocol based on RL overcomes traditional protocols, such as BATMAN and OLSR, specifically in Ppl and throughput parameters. These network performance results prove the relevance of the RL-based routing protocols to improve the computer, and ad-hoc networks. However, the RL technique generates an extra overhead. Thus, the proposed and developed adjustment algorithm was able to reduce the network overhead in terms of reducing the number of control messages. The dynamic adjustment in the frequency of sending Hello messages provided a reduction of up to 18% overhead. This gain increases the network’s payload providing better network performance. The global performance of the proposed method was optimized using different configurations and parameter values, leading to a final configuration which was defined experimentally. In terms of throughput and Ppl, in most of the test scenarios used in this work the e-RLRP achieved better performance, specially with respect to the Ppl parameter. Therefore, it is demonstrated that the proposed solution reduces overhead and also improves the network conditions.
Reducing network overhead in conventional protocols is an important approach because it provides performance improvements. This approach is even more relevant when it is used by new routing techniques, such RL, that aim to improve network performance but it generates extra overhead. Thus, an important contribution of this work is to demonstrate that extra overhead can be reduced using the proposed dynamic adjustment function.
It is worth noting that in our experimental tests different network topologies and configurations were used, including different numbers of nodes and their drops, and also different numbers of traffic flows.
Also, experimental results show the impact of network performance parameters on the user’s QoE in the VoIP communication services. The e-RLRP obtained better values of
due to having lower
values despite to have higher RTT values, which are calculated according to (
6) and (
7) defined in the
WB E-model algorithm. In this case, it is observed that
Ppl has a greater negative impact on speech quality than RTT, for the values obtained in the simulation scenarios considered in this research. Results indicate a quality improvement of more than 90% if compared to OLSR, and up to 8% if compared to RLRP. Therefore, it can be concluded that the RL-based routing protocols has a significant positive impact on user’s QoE in real-time communication services.
As a general conclusion, this research highlights the usefulness of incorporating machine-learning algorithms in routing protocols, specially for ad-hoc networks that recurrently present node drops. RL-based routing protocols can help to improve network conditions, and as a consequence, different communication applications are improved. In this work, only the VoIP service is evaluated, but in future works, video communication service will also be evaluated. Also, the implemented dynamic adjustment mechanism in the sending of Hello messages provided a performance improvement on the network, mainly by reducing overhead, which is an important approach to be applied in RL-based routing protocols.
In a future work, the proposed e-RLRP will be implemented in a real network environment to validate the performance results and potential benefits found in our simulation tests. Also, the inclusion of a scheduler or decentralized schedulers will be considered to work in conjunction with the e-RLRP algorithm in a future research, in which more complex and dynamic networks will be also implemented.