An Intersection-Based Routing Scheme Using Q-Learning in Vehicular Ad Hoc Networks for Traffic Management in the Intelligent Transportation System

Khan, Muhammad Umair; Hosseinzadeh, Mehdi; Mosavi, Amir

doi:10.3390/math10203731

Open AccessArticle

An Intersection-Based Routing Scheme Using Q-Learning in Vehicular Ad Hoc Networks for Traffic Management in the Intelligent Transportation System

by

Muhammad Umair Khan

¹,

Mehdi Hosseinzadeh

^2,3,4,* and

Amir Mosavi

^5,6,*

¹

School of Computing, Gachon University, 1342, Seongnam-daero, Sujeong-gu, Seongnam-si 13120, Korea

²

Institute of Research and Development, Duy Tan University, Da Nang 550000, Vietnam

³

School of Medicine and Pharmacy, Duy Tan University, Da Nang 550000, Vietnam

⁴

Computer Science, University of Human Development, Sulaymaniyah 0778-6, Iraq

⁵

Faculty of Civil Engineering, Technische Universität Dresden, 01069 Dresden, Germany

⁶

Faculty of Informatics, Obuda University, 1034 Budapest, Hungary

^*

Authors to whom correspondence should be addressed.

Mathematics 2022, 10(20), 3731; https://doi.org/10.3390/math10203731

Submission received: 20 August 2022 / Revised: 5 October 2022 / Accepted: 8 October 2022 / Published: 11 October 2022

(This article belongs to the Section E: Applied Mathematics)

Download

Browse Figures

Versions Notes

Abstract

:

Vehicular ad hoc networks (VANETs) create an advanced framework to support the intelligent transportation system and increase road safety by managing traffic flow and avoiding accidents. These networks have specific characteristics, including the high mobility of vehicles, dynamic topology, and frequent link failures. For this reason, providing an efficient and stable routing approach for VANET is a challenging issue. Reinforcement learning (RL) can solve the various challenges and issues of vehicular ad hoc networks, including routing. Most of the existing reinforcement learning-based routing methods are incompatible with the dynamic network environment and cannot prevent congestion in the network. Network congestion can be controlled by managing traffic flow. For this purpose, roadside units (RSUs) must monitor the road status to be informed about traffic conditions. In this paper, an intersection-based routing method using Q-learning (IRQ) is presented for VANETs. IRQ uses both global and local views in the routing process. For this reason, a dissemination mechanism of traffic information is introduced to create these global and local views. According to the global view, a Q-learning-based routing technique is designed for discovering the best routes between intersections. The central server continuously evaluates the created paths between intersections to penalize road segments with high congestion and improve the packet delivery rate. Finally, IRQ uses a greedy strategy based on a local view to find the best next-hop node in each road segment. NS2 software is used for analyzing the performance of the proposed routing approach. Then, IRQ is compared with three methods, including IV2XQ, QGrid, and GPSR. The simulation results demonstrate that IRQ has an acceptable performance in terms of packet delivery rate and delay. However, its communication overhead is higher than IV2XQ.

Keywords:

vehicular ad hoc networks (VANETs); intelligent transportation system (ITS); routing; reinforcement learning (RL); machine learning (ML)

MSC:

68M18

1. Introduction

Intelligent transportation system (ITS) plays an important role in improving modern life in a digital world. In 2017, the global demand for connected vehicles reached 63,026 million dollars, and an annual growth rate of almost 17.1% is expected for this demand in 2018–2025 to reach 225,158 million dollars in 2025 [1]. ITS can provide comprehensive and innovative services to improve traffic management in the future. This system builds smart vehicles through wireless communication technology [2,3]. Vehicles, along with road infrastructure, make a new wireless network known as vehicular ad hoc network (VANET) [4,5]. The initial purpose of this network was drivers’ safety and comfort in vehicular environments [6,7,8]. However, this view is slowly changing because the network is now used as an infrastructure for the intelligent transportation system and supports vehicles and any activity that require Internet in the smart city ecosystem. VANET is a subset of the mobile ad hoc network (MANET) [9,10]. It uses wireless technology for providing communication links between vehicles and fixed road infrastructures. In VANET, communications are classified into two groups, including vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I). In V2V communications, each vehicle exchanges traffic information such as road status, accidents, intention to rotate with neighboring vehicles. On the other hand, in V2I communications, the trusted authority (TA) shares road information through a graphical user interface and helps drivers to make an informed decision under special conditions to prevent accidents [1,11]. Figure 1 shows a vehicular ad hoc network.

VANETs have special features such as high-speed vehicles, dynamic topology, and scalability in terms of the number of nodes [11,12]. In the data transmission process, there are two main challenging issues: (1) Short connection time of the nodes and (2) frequent link failures. For this reason, considering low delay when transmitting messages and forming reliable communications are important factors in the data transfer process for VANETs. Given the specific features of these networks, it is very challenging to provide an efficient and stable routing protocol for ITS [13,14]. Therefore, the routing protocols presented for other wireless networks should be suitable for the vehicular ad hoc network. Over the past decade, many researchers have tried to enhance the performance of routing protocols in VANET. Various types of routing techniques in vehicular ad hoc networks include topology-based routing protocols [14,15], position-based routing protocols [14,15], geographic routing protocols [16,17], and hybrid routing protocols [16,17]. In recent years, many researchers have tried to solve the challenges of these networks [18,19,20]. They have used machine learning (ML) techniques to learn the routing process in VANETs adaptively and autonomously [21,22]. Reinforcement learning (RL) is the most common machine learning algorithm, which has a significant contribution to the deployment of the routing algorithms in VANETs [23]. This algorithm is popular because it uses a trial and error technique when designing protocols. In a reinforcement learning-based routing method, the agent explores the network environment by taking appropriate actions to achieve an optimized routing policy for the system [24,25,26]. For this purpose, the agent finds the best route between each source-destination pair based on the optimization criteria. Usually, this process uses local information about nodes to decide on routes, which result in low energy consumption and better network connectivity. However, the agent needs to identify the whole system in order to achieve an optimal routing technique.

In this paper, we present an Intersection-based Routing method using Q-learning (IRQ) for vehicular ad hoc networks. IRQ considers two global and local views in the routing process. The central server must create the global view because vehicles exhibit specific movement patterns every day, for example, during regular visits to certain places and relatively stable vehicle density in various areas. Moreover, buses and taxis travel predictable and certain paths in urban areas. The global view gives us a general picture of traffic on urban roads. We use this view in our routing process. According to this view, a Q-learning-based routing technique is designed to predict the best routes between intersections. In addition, IRQ utilizes a local view to design a greedy strategy on each road segment to find the best next-hop node. The local view includes information about the vehicle status, including location, distance, connection time, and delay for all available vehicles in that road segment. Therefore, our contributions include the following points:

IRQ presents a dissemination mechanism of traffic status information to provide both global and local views in the network. The purpose of this mechanism is to update traffic information constantly and inform the network server relative to traffic status in the network at any moment. According to this mechanism, beacon messages are periodically disseminated on the network and sent to the central server by roadside units (RSUs). This information is stored in the central server. IRQ utilizes this information in the global view-based routing process and the local view-based routing process.
In IRQ, the central server executes a global view-based routing algorithm to calculate various paths between different intersections based on the traffic status information. In this process, the agent (central server) trains a Q-table based on traffic status information such as node density, connection time, and delay in road segments. In this method, Q-value indicates how much is each intersection suitable for sending data packets to the destination. Furthermore, the central server constantly evaluates the created paths to penalize paths with high congestion and improves the packet delivery rate properly.
In IRQ, each vehicle uses a local view-based routing algorithm to find the best route in any road segment. This method applies a greedy routing technique to reach the destination.

The rest of the paper is as follows: Section 2 presents the related works. In Section 3, the Q-learning algorithm is described because the proposed method utilizes this learning technique in the routing process. Section 4 expresses the network model used in the proposed routing approach. In Section 5, we introduce the intersection-based routing method using Q-learning (IRQ) for vehicular ad hoc networks. In Section 6, the proposed method is implemented to evaluate its performance in terms of packet delivery rate, end-to-end delay, hop count, and routing overhead. Finally, Section 7 concludes the paper.

2. Related Works

Ji et al. [27] have suggested a reinforcement learning-based hybrid routing protocol (RHR) in VANET. It refreshes routing information using the RL algorithm. Additionally, RHR is not dependent on one route and uses multiple routes. It utilizes a packet-carry-on feedback system to give a positive reward to the paths, which improve the packet transmission process, and penalizes the routes that require high control messages and suffer from packet loss. In RHR, the routing table is periodically refreshed according to information extracted from the received packets to select the optimal route from this table. To reduce the routing overhead, RHR utilizes a conditional routing approach. The routing overhead is high when the agent has to evaluate many states. Thus, RHR considers only a fixed number of states to reduce the routing overhead.

Saravanan et al. [28] have presented the VANET routing protocol using deep reinforcement learning (VRDRT). This approach utilizes deep reinforcement learning (DRL) to calculate the movement pattern of vehicles in the road segments. In VRDRT, the authors’ argument is that a routing algorithm must predict road segments with the highest density to decrease the store-carry-forward (SCF) mechanism. Thus, it applies DRL to forecast road traffic status at a certain time. In this routing approach, each roadside unit (RSU) is responsible for collecting and storing traffic information about vehicles in each road segment to forecast road traffic using DRL. In VRDRT, DRL calculates delay and the destination location. VRDRT uses a clustering process to divide roads into several clusters. In the clustering process, the density of vehicles is considered. This parameter is calculated based on the ratio of the number of vehicles available in each road segment to the total number of vehicles in the network. VRDRT consists of two phases: Routing selection and route creation. The task of the first phase is to search for the optimal path from discovered paths. Moreover, the task of the second phase is to find various paths. Both phases utilize DRL to achieve the desired result. The DRL agent selects the best path based on previous experiences. As a result, this method is regarded as a supervised learning scheme.

Wu et al. [29] have introduced the Q-learning-based traffic-aware routing (QTAR) protocol in VANETs. This method utilizes geographic routing strengths and successfully uses RSUs to forward packets to the desired vehicle. In QTAR, Q-learning is used in two modes: the data transmission between vehicles (or V2V routing mode) and the data transmission between RSUs (or R2R routing mode). In the V2V routing mode, data packets are considered as the agent, and vehicles are regarded as the state space. In the R2R routing mode, packets are also considered as the agent, and neighboring RSUs indicate the state space. In QTAR, there are two Hello packets, namely HelloV2V and HelloR2R, which are used for V2V and R2R routing modes, respectively. In this method, the authors consider two factors, namely the minimum end-to-end delay and high connection reliability, for determining Q-value in each state. QTAR assumes that each road segment is managed by an RSU. Furthermore, QTAR is an urban routing approach, which is dependent on road intersections.

Yang et al. [30] have proposed the heuristic Q-learning-based VANET routing (HQVR) protocol for vehicular ad hoc networks. This method calculates link reliability to select intermediate nodes based on this factor. HQVR is a decentralized algorithm so that the learning process is executed based on the information obtained from the exchanged beacon messages. HQVR ignores the road width when designing the routing process. Furthermore, the Q-learning algorithm in VANET depends on the beacon message rate, which mainly slows down convergence speed. HQVR utilizes a heuristic procedure to speed up the convergence speed in the learning algorithm. It determines the learning rate based on the link connection time. Note that the learning rate determines the convergence speed. HQVR applies a strategy to improve the route exploration process. In this strategy, the delay information is saved in packets. Thus, if a node finds out that the new route has less delay compared to the previous route, it updates the old route. Feedback messages travel various paths to reach the destination. Thus, the best route section process by the source node is flexible.

Wu et al. [31] have suggested the Q-learning-based VANET delay-tolerant routing protocol (QVDRP). It utilizes gateways to send packets from the source vehicle to the cloud server. QVDRP uses a position forecasting technique. Moreover, this method considers RSUs as gateways to connect with the servers. The purpose of QVDRP is to send the data generated by vehicles to RSUs. This method reduces delay and maximizes the packet delivery rate. In QVDRP, the network is the learning environment, and vehicles are the learning agents. In the learning process, the selection of the next-hop node is an action. Vehicles hold a Q-table for storing Q-values corresponding to nodes. Q-table must be updated by exchanging Hello messages periodically. If the transmitter vehicle is directly communicated with the destination, it will be rewarded. If a previous-hop node hears packets from a node before a threshold time, it obtains a discounted positive reward. Otherwise, Q-value is equal to 0.75. The collision probability forecasts the input and output directions in each road segment to reduce the duplicated packets.

Karp and Kung in [32] have presented the greedy perimeter stateless routing (GPSR) for ad hoc networks. In greedy schemes, vehicles use the information of single-hop neighboring nodes in the routing decisions to send data to the next-hop node closest to the destination. It is a geographic routing approach that utilizes both techniques, namely the greedy and perimeter forwarding. However, GPSR uses beacon messages to update the neighboring table. GPSR has low delay and acceptable routing overhead. However, GPSR does not consider parameters such as delay, the velocity of nodes, or the motion direction in the routing process. This has limited this method in VANETs.

Li et al. in [33] have offered the Q-learning and grid-based routing protocol (QGrid) for VANETs. In this approach, the network environment is partitioned into several grids. Then, Q-learning learns the features of traffic flow in these grids and chooses the optimal grid. Next, the greedy method and the second-order Markov chain prediction technique are used to select the relay node in this grid. QGrid has also addressed the packet delivery issue from the vehicle to a fixed destination. In the inter-gird routing process, this protocol always selects an optimal sequence of grids with the largest Q-values. However, if the number of packets increases in the network, this method does not offer any mechanism for controlling network congestion. Additionally, obstacles such as intersections and buildings may affect the data transmission process. However, QGrid ignores this issue. In this scheme, Q-table is designed using an off-line method and is fixed throughout the simulation process. This scheme does not consider load in the routing process.

Lou et al. [34] have proposed the intersection-based V2X routing via Q-learning (IV2XQ) for VANET. It is a hierarchical routing method. A Q-learning-based routing algorithm is designed at the intersection level. This algorithm uses historical traffic information to discover the network environment and select the best routes between intersections. In this learning process, road intersections are considered as the state space, and the road segments are regarded as the action space. This reduces the number of states in Q-learning algorithm and improves its convergence speed. Furthermore, a greedy strategy is used for selecting the next-hop node at road segments. It utilizes the position of vehicles in the road segments. In addition, in this method, RSUs are responsible for monitoring network status in a real-time manner to manage the network load and prevent congestion in the network. In IV2XQ, no control packet is exchanged to discover routes. This reduces routing overhead and improves delay. IV2XQ only uses historical traffic information to learn the network environment. However, relying on fresh information is essential to making correct routing decisions.

The advantages and disadvantages of the methods mentioned in this section are summarized in Table 1.

3. Basic Concepts

In this section, the Q-learning algorithm is briefly described because IRQ utilizes this learning technique to find the best route between intersections in the data transmission process in VANET. Reinforcement learning (RL) is an important and useful tool in machine learning (ML). It has two main components, namely the agent and environment. The agent is responsible for performing a set of actions (A) and continually interacting with the environment to explore it. The agent decides on an action based on the Markov decision process (MDP) to get an optimal solution for the desired problem [35,36]. MDP is a framework for modeling various decision issues, and can randomly manage this process. MDP is defined by four parameters

(S, A, p, r)

. S and A indicate the finite state and action sets, respectively. Furthermore, P represents the transition function, meaning that the current state (s) changes to the next state (

s^{'}

) after taking the action a. Moreover, r is the reward value given to the agent by the environment after taking the action

a_{t}

in the state

s_{t}

at the moment t. According to Figure 2, the agent searches for its current state (

s_{t}

) in the time t and performs the action

a_{t}

. Based on the performed action and the current state, the learning environment allocates the reward r and the next state (

s_{t + 1}

) to the agent. In the reinforcement learning process, the main purpose is to achieve the optimal policy (

π

) and receive the maximum reward from the environment [37]. In the long term, the agent seeks to maximize the expected discounted reward (

max [\sum_{t = 0}^{T} δ r_{t} (s_{t}, π (s_{t}))]

), so that

δ \in [0, 1]

is known as the discount factor. Discount factor indicates the importance of the reward and is the effort of the agent to discover the environment. The value of this factor is limited to

[0, 1]

. When this value is close to one, the agent uses previous experiences more. However, when this value approaches zero, the agent only benefits from the latest reward. Based on the reward value, when the transition probabilities are determined, the Bellman equation called the Q-function is formed to perform the next action (

a_{t + 1}

) using MDP. The Q-function is calculated based on Equation (1):

Q (s_{t}, a_{t}) = (1 - α) Q (s_{t}, a_{t}) + α [r + δ (max Q (s_{t + 1}, a_{t}))]

(1)

α

is the learning rate and

0 < α \leq 1

. This parameter determines the agent’s attention to new and old information. When

α

approaches close to zero, the agent does not pay attention to the new information in the learning process. In contrast, when this parameter approaches one, the agent only pays attention to the latest information and does not consider old information in the learning process.

Q-learning is a type of free model RL algorithm. The learning agent uses this algorithm for exploring the environment and learning the optimal strategy using a trial and error technique. In this algorithm, state-action pairs and Q-values are maintained in the Q-table. The purpose of this algorithm is to adjust the action selection strategy based on the reward received from the environment to maximize Q-value by selecting the best action in the future. Q-values are updated in each iteration using Equation (1). Then, the agent starts the exploitation process by taking actions that maximize Q-values. This policy is known as

ε

-greedy. In this policy, the agent begins the exploration or exploitation process based on the probability value (

ε

) [38,39].

4. Network Model

In IRQ, the network includes several intersections that connect to each other through two-line road segments. The network supports two types of communication, namely vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I). There are three main entities in the network: central server, roadside units (RSUs), and vehicles. The network model is shown in Figure 3. Note that each component in the network, including road segments, intersections, central server, RSUs, and vehicles has a unique identification. In the following, each entity in the network is described in detail:

Central server: This server uses traffic status messages to achieve a global view of the whole network. This robust and high-energy node plays the agent role in the global view-based routing process to discover the VANET environment. The central server learns the best routing strategy between intersections by interacting with the network environment using the Q-learning algorithm. Finally, this node sends the Q-table to RSUs in the network.
Roadside units (RSUs): These entities are directly connected to the central server and are at intersections. These nodes are responsible for monitoring the network, sending traffic status messages to the central server, and controlling congestion on each road segment. Furthermore, RSUs are connected to the central server and periodically send traffic messages to it. In addition, each RSU is responsible for finding the best route from Q-table stored in its memory and sending data packets to the next intersection based on this route.
Vehicles: These entities periodically exchange beacon messages between themselves. The information obtained from this message is stored in the neighborhood table of each vehicle. Moreover, each vehicle is equipped with a positioning system to obtain its spatial and speed information at any moment. In each road segment, vehicles utilize a local view-based routing method to communicate with other vehicles in a multi-hop manner.

Figure 3. Network model in the proposed method.

5. Proposed Method

In this section, we introduce the Intersection-based Routing method using Q-learning (IRQ) for vehicular ad hoc networks. The proposed method includes two views, namely global and local. The global view provides a general picture of traffic on the urban road segments for the network server. Whereas, the local view includes the status of vehicles, including position, speed, link lifetime time, and one-hop delay. IRQ includes three main phases described in the following subsections:

Dissemination mechanism of traffic status information;
Global view-based routing algorithm;
Local view-based routing algorithm.

5.1. Dissemination Mechanism of Traffic Information

In this section, the dissemination mechanism of traffic information is described. The purpose of this mechanism is that the proposed method can be compatible with the dynamic topology of VANET to create the global and local views of the network. Note that traffic status information is constantly updated so that the server is aware of the network status at any moment. According to this mechanism, each vehicle (such as

V_{i}

) periodically broadcasts a beacon message including its identification (

I D_{V_{i}}

), road ID (

I D_{R}

, where, R indicates the desired road segment,

R = 1, 2, . . ., M

, and M is the total number of road segments on the network), information about its queue delay, its position

(x_{i}^{t}, y_{i}^{t})

and its speed

(v_{x, i}^{t}, v_{y, i}^{t})

for its neighboring nodes. After receiving this beacon message from neighbors,

V_{i}

forms a neighborhood table (i.e.,

T a b l e_{n e i g h b o r_{i}}

) and records information about its neighbors (such as

V_{j}

). The format of

T a b l e_{n e i g h b o r}

is presented in Table 2. As shown in this table, there are two new fields, namely connection time (

C T_{i, j}^{t}

) and delay (

D e l a y_{i, j}

) in

T a b l e_{n e i g h b o r}

. These fields are explained in the following.

5.1.1. Calculating the Connection Time of Two Vehicles

In this section, we explain how to calculate the connection time (

C T_{i, j}^{t}

) between two vehicles

V_{i}

and

V_{j}

. IRQ uses this factor in the routing process to increase the stability of the formed path between source and destination. The connection time between these vehicles is obtained from Equation (2). This factor depends on two parameters, including the Euclidean distance and the relative velocity of

V_{j}

with regard to

V_{i}

. Moreover, Equation (2) considers

Δ θ_{i j}

, which is the movement direction of

V_{j}

relative to

V_{i}

.

C T_{i j} = \{\begin{matrix} \frac{R_{c o m} + \sqrt{{(x_{i}^{t} - x_{j}^{t})}^{2} + {(y_{i}^{t} - y_{j}^{t})}^{2}}}{|v_{i}^{t} - v_{j}^{t}|}, 0 \leq Δ θ_{i j} \leq \frac{π}{3} and v_{i} \geq v_{j} \\ \frac{R_{c o m} - \sqrt{{(x_{i}^{t} - x_{j}^{t})}^{2} + {(y_{i}^{t} - y_{j}^{t})}^{2}}}{|v_{i}^{t} - v_{j}^{t}|}, 0 \leq Δ θ_{i j} \leq \frac{π}{3} and v_{i} < v_{j} \\ \frac{R_{c o m} + \sqrt{{(x_{i}^{t} - x_{j}^{t})}^{2} + {(y_{i}^{t} - y_{j}^{t})}^{2}}}{|v_{i}^{t} + v_{j}^{t}|}, \frac{2 π}{3} \leq Δ θ_{i j} \leq π and V_{i} is close to V_{j} \\ \frac{R_{c o m} - \sqrt{{(x_{i}^{t} - x_{j}^{t})}^{2} + {(y_{i}^{t} - y_{j}^{t})}^{2}}}{|v_{i}^{t} + v_{j}^{t}|}, \frac{2 π}{3} \leq Δ θ_{i j} \leq π and V_{i} is far from V_{j} \end{matrix}

(2)

where,

R_{c o m}

is the communication radius of vehicles. Moreover,

(x_{i}^{t}, y_{i}^{t})

and

v_{i}^{t}

are the spatial coordinates and speed of

V_{i}

at the moment t, respectively.

(x_{j}^{t}, y_{j}^{t})

and

v_{i}^{t}

are the spatial coordinates and speed of

V_{j}

at the moment t, respectively. In addition,

Δ θ_{i j}

is the motion direction of

V_{j}

with regard to

V_{i}

. It is calculated using Equation (3):

\begin{matrix} Δ θ_{i j} = {cos}^{- 1} (\frac{v_{x, i}^{t} v_{x, j}^{t} + v_{y, i}^{t} v_{y, i}^{t}}{\sqrt{{(v_{x, i}^{t})}^{2} + {(v_{y, i}^{t})}^{2}} \times \sqrt{{(v_{x, j}^{t})}^{2} + {(v_{y, j}^{t})}^{2}}}), \\ 0 \leq Δ θ_{i j} \leq π \end{matrix}

(3)

where,

(v_{x, i}^{t}, v_{y, i}^{t})

and

(v_{x, j}^{t}, v_{y, j}^{t})

are velocity vectors of

V_{i}

and

V_{j}

at the moment t, respectively.

5.1.2. Calculating the Delay between Two Vehicles

In this section, we describe how to calculate a single-hop delay (

D e l a y_{i, j}

) between the two vehicles

V_{i}

and

V_{j}

. IRQ uses this factor to select the path with the lowest delay so that data packets are sent to the destination in the shortest possible time. Single-hop delay between

V_{i}

and

V_{j}

is calculated based on Equation (4). This factor depends on three parameters, including transmission delay, media access delay, and processing and queuing delay.

D e l a y_{i j} = D_{M a c_{i j}} + D_{Q u e_{i j}} + D_{T r a n s_{i j}}

(4)

where,

D_{Q u e_{i j}}

indicates the processing and queuing delay. It is equal to the time needed to reach the packets at the top of the buffer queue. It is obtained from beacon messages.

D_{M a c_{i j}}

indicates the media access delay. This parameter is estimated using ACK packets received from the neighboring node:

D_{M a c_{i j}} = t_{A c k} - t_{S e n d}

(5)

where,

t_{A c k}

is equal to the time that the ACK packet is received from the neighboring node

V_{j}

. Moreover,

t_{s e n d}

is equal to the moment that

V_{i}

has sent the data packet to

V_{j}

.

The transmission delay (

D_{T r a n s_{i j}}

) also indicates the time required to transfer the data packet from

V_{i}

to

V_{j}

. It is calculated using Equation (6):

D_{T r a n s_{i j}} = \frac{I_{m}}{b r}

(6)

where,

I_{m}

is the length of the message, and

b r

indicates the data transmission rate.

5.2. Dissemination of Traffic State Information

RSUs receive beacon messages from vehicles on different road segments and form a traffic table (

T a b l e_{t r a f f i c}

) according to these messages. The information in this table is periodically updated. The format of

T a b l e_{t r a f f i c}

is presented in Table 3.

As shown in this table, each RSU counts the number of vehicles in each road segment based on the vehicle ID and the road ID inserted in beacon messages and inserts it into the vehicle density field. Additionally, RSU obtains the connection time (i.e.,

C T_{i j}

) and delay (i.e.,

D e l a y_{i j}

) between two vehicles

V_{i}

and

V_{j}

using Equations (2) and (4). Then, the average connection time (

{\bar{C T}}_{R}

) and average single-hop delay (

{\bar{D e l a y}}_{R}

) related to each road segment are calculated according to Equations (7) and (8), respectively. Finally, the parameters are inserted into the traffic table.

{\bar{C T}}_{R} = \frac{2}{n_{R} (n_{R} + 1)} \sum_{i = 1}^{n_{R}} \sum_{j = i + 1}^{n_{R}} C T_{i j}

(7)

{\bar{D e l a y}}_{R} = \frac{2}{n_{R} (n_{R} + 1)} \sum_{i = 1}^{n_{R}} \sum_{j = i + 1}^{n_{R}} D e l a y_{i j}

(8)

where,

n_{R}

indicates the total number of vehicles on the road segment (R).

Then, each RSU periodically sends traffic information about various road segments to the central server. In IRQ, the dissemination period of this information is 5 s because traffic status information changes slightly during this time. This time indicates the validity duration of this message. The format of the traffic information packet is presented in Table 4. As shown in this table, each RSU sends information about four road segments connected to its intersection to the central server.

This packet includes the following fields:

ID of RSU: This field represents the identification of RSU, which transmits the traffic information packet.
Intersection ID: This field indicates the identification of the intersection having the transmitter RSU.
Time to live: This field is adjusted based on the validity time of the traffic message (i.e., 5 s). After ending this period, the traffic information message is invalid and removed from the network.
Road ID: Each intersection is connected to the four road segments, including the upper road, the down road, the left road, and the right road. This field indicates the identification of the corresponding road segment based on the traffic table.
Vehicle density: This field is equal to the number of vehicles in a road segment (i.e., up, down, left, and right), which is inserted into the traffic table.
Average connection time: This field represents the average connection time related to each road segment, which is obtained from Equation (7) and recorded in the traffic table.
Average road delay: This field indicates the average single-hop delay corresponding to each road segment. It is calculated according to Equation (8) and stored in the traffic table.

Finally, the central server uses traffic information messages to achieve a global view of the entire network. The central server uses the scheme of window mean with exponentially weighted moving average (WMEWMA) to update three parameters, including vehicle density, average connection time, and average delay in each road segment. In this method, the window length is equal to w. In fact, each window records and maintains the information of the last w traffic information messages. This helps the central server to consider not only the latest traffic information but also a history of traffic information on the network to get a better view of network traffic. Accordingly, the server updates these parameters, namely vehicle density, the average connection time, and the average delay in a road segment according to Equations (9)–(11) respectively:

D e n s_{R} (l) = (1 - β_{1}) \frac{\sum_{k = l - w}^{l - 1} D e n s_{R} (k)}{w} + β_{1} D e n s_{R}

(9)

{\bar{C T}}_{R} (l) = (1 - β_{2}) \frac{\sum_{k = l - w}^{l - 1} {\bar{C T}}_{R} (k)}{w} + β_{2} {\bar{C T}}_{R}

(10)

{\bar{D e l a y}}_{R} (l) = (1 - β_{3}) \frac{\sum_{k = l - w}^{l - 1} {\bar{D e l a y}}_{R} (k)}{w} + β_{3} {\bar{D e l a y}}_{R}

(11)

where,

β_{1}

,

β_{2}

, and

β_{3}

are adjustable parameters in

[0, 1]

and w is the window length. The pseudo-code related to the dissemination mechanism of the traffic information is presented in Algorithm 1.

Algorithm 1 Dissemination of traffic status information

Input: $V_{i}$ , $i = 1, . . ., N$
N: The number of vehicles in the network.
$N_{n e i g h b o r_{i}}$ : The number of neighbors of $V_{i}$ .
Server, Beacon messages, Traffic messages
$RS U_{k}$
$N_{R S U}$ : The number of RSUs in the network.
Output: $T a b l e_{n e i g h b o r}$
$T a b l e_{t r a f f i c}$
Begin

1:: for $i = 1$ to N do
2:: if the time of sending Beacon message arrives then
3:: $V_{i}$ : Multicast Beacon messages for its neighboring nodes in the network;
5:: end if
5:: for $j = 1$ to $N_{n e i g h b o r_{i}}$ do
6:: if $V_{i}$ receives Beacon messages from $V_{j}$ then
7:: $V_{i}$ : Inserts the ID, spatial coordinates, and velocity of $V_{j}$ into $T a b l e_{n e i g h b o r}$ ;
8:: $V_{i}$ : Inserts the ID of the road corresponding $V_{j}$ into $T a b l e_{n e i g h b o r}$ ;
9:: $V_{i}$ : Calculate the connection time between $V_{i}$ and $V_{j}$ ( $C T_{i, j}$ ) and insert it into $T a b l e_{n e i g h b o r}$ ;
10:: $V_{i}$ : Calculate the delay between $V_{i}$ and $V_{j}$ ( $D e l a y_{i, j}$ ) and insert it into $T a b l e_{n e i g h b o r}$ ;
11:: end if
12:: end for
13:: end for
14:: for $k = 1$ to $N_{R S U}$ do
15:: if $R S U_{k}$ receives Beacon messages from $V_{i}$ then
16:: if the ID of the road corresponding $V_{i}$ is equal to $I D_{R_{u p}}$ then
17:: ${RSU}_{k}$ : Insert $I D_{R_{u p}}$ in $T a b l e_{t r a f f i c}$ ;
18:: ${RSU}_{k}$ : Add one unit to $D e n s_{R_{u p}}$ in $T a b l e_{t r a f f i c}$ ;
19:: ${RSU}_{k}$ : Update ${\bar{C T}}_{R_{u p}}$ in $T a b l e_{t r a f f i c}$ ;
20:: ${RSU}_{k}$ : Update ${\bar{D e l a y}}_{R_{u p}}$ in $T a b l e_{t r a f f i c}$ ;
21:: end if
22:: if the ID of the road corresponding $V_{i}$ is equal to $I D_{R_{d o w n}}$ then
23:: ${RSU}_{k}$ : Insert $I D_{R_{d o w n}}$ in $T a b l e_{t r a f f i c}$ ;
24:: ${RSU}_{k}$ : Add one unit to $D e n s_{R_{d o w n}}$ in $T a b l e_{t r a f f i c}$ ;
25:: ${RSU}_{k}$ : Update ${\bar{C T}}_{R_{d o w n}}$ in $T a b l e_{t r a f f i c}$ ;
26:: ${RSU}_{k}$ : Update ${\bar{D e l a y}}_{R_{d o w n}}$ in $T a b l e_{t r a f f i c}$ ;
27:: end if
28:: if the ID of the road corresponding $V_{i}$ is equal to $I D_{R_{l e f t}}$ then
29:: ${RSU}_{k}$ : Insert $I D_{R_{l e f t}}$ in $T a b l e_{t r a f f i c}$ ;
30:: ${RSU}_{k}$ : Add one unit to $D e n s_{R_{l e f t}}$ in $T a b l e_{t r a f f i c}$ ;
31:: ${RSU}_{k}$ : Update ${\bar{C T}}_{R_{l e f t}}$ in $T a b l e_{t r a f f i c}$ ;
32:: ${RSU}_{k}$ : Update ${\bar{D e l a y}}_{R_{l e f t}}$ in $T a b l e_{t r a f f i c}$ ;
33:: end if
34:: if the ID of the road corresponding $V_{i}$ is equal to $I D_{R_{r i g h t}}$ then
35:: ${RSU}_{k}$ : Insert $I D_{R_{r i g h t}}$ in $T a b l e_{t r a f f i c}$ ;
36:: ${RSU}_{k}$ : Add one unit to $D e n s_{R_{r i g h t}}$ in $T a b l e_{t r a f f i c}$ ;
37:: ${RSU}_{k}$ : Update ${\bar{C T}}_{R_{r i g h t}}$ in $T a b l e_{t r a f f i c}$ ;
38:: ${RSU}_{k}$ : Update ${\bar{D e l a y}}_{R_{r i g h t}}$ in $T a b l e_{t r a f f i c}$ ;
39:: end if
40:: end if
41:: if the time of sending Traffic message arrives then
42:: ${RSU}_{k}$ : Send Traffic messages to server in the network;
43:: end if
44:: if server receives Traffic messages from $R S U_{k}$ then
45:: Server: Update $D e n s_{R} (l)$ , ${\bar{C T}}_{R} (l)$ , and ${\bar{D e l a y}}_{R} (l)$ based on Equations (9)–(11), respectively;
46:: end if
47:: end for

End

5.3. Global View-Based Routing Algorithm

The central server is responsible for implementing the global view-based routing algorithm using Q-learning to select the best routes in the network. In the routing process, the source vehicle first achieves its spatial coordinates and the position of the destination vehicle using the global positioning system (GPS). Then, it determines the source and destination intersections based on this information. Note that there are two intersections at the two ends of the road segments, which are related to source and destination. Therefore, before starting the routing process, it must determine the source and destination intersections. The source vehicle must select one of two intersections that are at the end of its corresponding road segment. It selects the intersection that is closest to the destination node, as the source intersection. In a similar manner, the destination node selects the intersection that is closest to the source node as the destination intersection. Then, the source vehicle sends data packets to the source intersection using V2V communications described in Section 5.4. Next, the RSU at this intersection uses the Q-table stored in its memory to choose the next intersection with the highest Q-value. Finally, RSU sends the packet to the next intersection using the V2I communication described in Section 5.4. This process continues until the data packet reaches the destination vehicle. In this routing process, the central server plays the agent role and uses traffic status information to discover the network environment and find optimal paths between intersections in the network. Furthermore, in this process, VANET is regarded as the environment that interacts with the central server (i.e., agent). When the agent performs an action, the environment responds to the agent and changes the agent’s state. In this learning issue, the state space is

I = \{I n t e r s c e t_{1}, I n t e r s e c t_{2}, . . ., I n t e r s c e t_{p}\}

, this set includes all intersections in the network, so that p is equal to the total number of intersections in the network.

I n t e r s e c t_{i}^{t}

means that the packet reaches the intersection i at the moment t. Additionally, the action space is

R o a d = \{R_{u p}, R_{d o w n}, R_{l e f t}, R_{r i g h t}\}

. It means a set of road segments connected to the current intersection. After selecting a road segment, the packets are sent from

I n t e r s e c t_{i}^{t}

to

I n t e r s e c t_{j}^{t + 1}

. After doing this action, the environment gives a reward to the agent based on the reward function presented in Equation (12):

\begin{matrix} R_{t} = \{\begin{matrix} R_{max}, I n t e r s e c t_{j}^{t + 1} i s d e s t i n a t i o n \\ R_{min}, I n t e r s e c t_{j}^{t + 1} i s l o c a l m i n i m u m \\ \frac{D e n s_{R_{c u r r e n t}} (l)}{max_{R \in I n t e r s e c t_{i}} D e n s_{R} (l)} + \frac{{\bar{C T}}_{R_{c u r r e n t}} (l)}{max_{R \in I n t e r s e c t_{i}} {\bar{C T}}_{R} (l)} + (1 - \frac{{\bar{D e l a y}}_{R_{c u r r e n t}} (l)}{max_{R \in I n t e r s e c t_{i}} {\bar{D e l a y}}_{R} (l)}), \\ O t h e r w i s e \end{matrix} \end{matrix}

(12)

where,

D e n s_{R_{c u r r e n t}} (l)

,

{\bar{C T}}_{R_{c u r r e n t}} (l)

, and

{\bar{D e l a y}}_{R_{c u r r e n t}} (l)

indicate the vehicle density, the average connection time, and the average delay in the current road segment, respectively. These parameters are calculated based on Equations (9)–(11). According to the reward function, if the next intersection is the destination intersection, then the road segment between

i n t e r s e c t_{i}^{t}

and

i n t e r s e c t_{j}^{t + 1}

achieves the maximum reward. On the other hand, when a local optimum occurs, meaning that all neighboring intersections of the current intersection are farther away from the destination intersection compared to the current intersection, this intersection gets the minimum reward. In other modes, the reward function is evaluated based on the vehicle density, connection time, and delay.

In the learning process, if the learning parameters (i.e., the discount factor and the learning rate) are constant, the routing algorithm cannot adapt to the dynamic environment. Therefore, in IRQ, the discount factor (

δ

) is dynamically calculated according to the network conditions, and the learning rate is determined based on the empirical value and according to [34] so that the learning rate (

α

) is equal to

0.1

. Note that the purpose of the routing process is to find the next intersection to send data packets. As a result,

δ

should be selected according to the density of the road segment and the distance between the next intersection and the destination intersection because if the next intersection is far from the destination or the density of the corresponding road segment is very high or very low, the Q-value corresponding to this intersection will not be stable. Therefore, in this case,

δ

should be low. IRQ calculates the discount factor using Equation (13). It considers two parameters, including vehicles density and the distance to the destination:

\begin{matrix} δ = λ \times (1 - \frac{D e n s_{R} (l) - D e n s_{t h r e s h o l d}^{min}}{max_{R \in I n t e r s e c t_{i}} D e n s_{R} (l) - min_{R \in I n t e r s e c t_{i}} D e n s_{R} (l)}) + \\ (1 - λ) \times max (0, (1 - \frac{d_{t + 1, D}}{d_{t, D}})) \end{matrix}

(13)

where,

λ

is a weight coefficient and

0 \leq λ \leq 1

. Furthermore,

D e n s_{R} (l)

indicates the density of vehicles on the current road segment.

D e n s_{t h r e s h o l d}^{min}

also indicates a minimum threshold for vehicle density. This threshold has an acceptable packet delivery rate and is determined based on an empirical value.

d_{t, D}

is the distance between the current intersection (

i n t e r s e c t_{t}

) and the destination intersection, and

d_{t + 1, D}

indicates the distance between the next intersection (

i n t e r s e c t_{t + 1}

) and the destination intersection. It is calculated based on Equation (14):

d_{x, D} = \sqrt{{(x_{x} - x_{D})}^{2} + {(y_{x} - y_{D})}^{2}}

(14)

So that

(x_{x}, y_{x})

and

(x_{D}, y_{D})

are the coordinates of the desired intersection and the destination intersection, respectively. According to Equation (13), if the next intersection is closer to the destination intersection compared to the current intersection and the vehicle density in the corresponding road segment is close to the minimum density threshold,

δ

will have a larger value.

To explore the network environment, the agent performs various actions to reach different states. According to this process, a Q-value is obtained for each state-action pair. It is stored in a Q-table. This value is used for deciding and selecting the routing paths. After calculating Q-table, the agent sends this table to RSUs in the network. Note that in the global view-based routing process, the server prevents congestion in the routes to reduce collision and packet loss. The server uses the traffic status information to prevent congestion. Whenever the delay in one road segment is more than a threshold value

D_{t h r e s h o l d}

(i.e.,

{\bar{D e l a y}}_{R} \geq D_{t h r e s h o l d}

), or the vehicle density is higher than the threshold value

D e n s_{t h r e s h o l d}^{max}

(i.e.,

D e n s_{R} \geq D e n s_{t h r e s h o l d}^{max}

), meaning that this road segment increases collision and packet loss. Additionally, when the vehicle density in the road segment is less than the threshold

D e n s_{t h r e s h o l d}^{min}

(i.e.,

D e n s_{R} \leq D e n s_{t h r e s h o l d}^{min}

), it increases packet loss and delay in the transmission process. In these modes, the server penalizes these road segments and reduces their reward so that the road segments are not selected for data delivery. The pseudo-code of this process is presented in Algorithm 2.

Algorithm 2 Global view based-routing process

Input: $ε$ , $α$ , $γ$ : Q-learning parameters
$N_{R}$ : The number of covered routes in the network.
$I = \{I n t e r s c e t_{1}, I n t e r s e c t_{2}, . . ., I n t e r s c e t_{p}\}$
$R o a d = \{R_{u p}, R_{d o w n}, R_{l e f t}, R_{r i g h t}\}$
Output: Q-Table
Begin

1:: while the convergence condition is not met do
2:: for $e p i s o d e = 1$ to M do
3:: Server: Select an intersection as initial state $I n t e r s e c t_{i}^{t}$ ;
4:: for $t = 1$ to N do
5:: Server: Select a random number $n u m_{r a n d}$ in $[0, 1]$ ;
6:: if $n u m_{r a n d} \leq ε$ then
7:: Server: Choose an action from $R o a d = \{R_{u p}, R_{d o w n}, R_{l e f t}, R_{r i g h t}\}$ , randomly;
8:: else
9:: Server: Choose an action with maximum Q from Q-table;
10:: end if
11:: if $I n t e r s e c t_{i}^{t + 1}$ is destination then
12:: $R_{t} = R_{max}$
13:: else if $I n t e r s e c t_{i}^{t + 1}$ is a local minimum then
14:: $R_{t} = R_{min}$
15:: else
16:: $R_{t} = \frac{D e n s_{R_{c u r r e n t}} (l)}{max_{R \in I n t e r s e c t_{i}} D e n s_{R} (l)} + \frac{{\bar{C T}}_{R_{c u r r e n t}} (l)}{max_{R \in I n t e r s e c t_{i}} {\bar{C T}}_{R} (l)} + (1 - \frac{{\bar{D e l a y}}_{R_{c u r r e n t}} (l)}{max_{R \in I n t e r s e c t_{i}} {\bar{D e l a y}}_{R} (l)})$ ;
17:: end if
18:: Server: Update Q-value in Q-table according to the reward value;
19:: end for
20:: end for
21:: end while
22:: Server: Send Q-table to all RSUs in the network;
23:: for $i = 1$ to $N_{R}$ do
24:: if ${\bar{D e l a y}}_{R_{i}} \geq D_{t h r e s h o l d}$ or $D e n s_{R_{i}} \geq D e n s_{t h r e s h o l d}^{max}$ or $D e n s_{R_{i}} \leq D e n s_{t h r e s h o l d}^{min}$ then
25:: Server: Set its reward value as $R_{min}$ ;
26:: Server: Update its Q-value in Q-table;
27:: Server: Send Q-table to all RSUs in the network;
28:: end if
29:: end for

End

5.4. Local View-Based Routing Algorithm

To send packets to road segments, IRQ uses a greedy routing strategy for selecting the next-hop node. Note that vehicles periodically share beacon messages with each other and create a neighborhood table according to this message as described in Section 5.1. The format of this table is expressed in Table 2. This table is used in the local view-based routing process. The greedy strategy is used for sending packets between vehicles at the road segment between two consecutive intersections and sending packets between vehicles and RSUs located at these intersections. As a result, the local view-based routing process includes two parts: V2V routing and V2I routing.

V2V routing: In a road segment, each vehicle uses the V2V routing process for choosing the next-hop node. When the source node produces the data packet, it also achieves the position of the destination vehicle. If the two vehicles are in a similar road segment, then the destination node is regarded as $T a r g e t$ . Otherwise, the source intersection is considered as $T a r g e t$ . Then, the V2V routing process is executed based on the position of $T a r g e t$ . Accordingly, the source vehicle selects the closest vehicle to $T a r g e t$ as the next-hop node and sends the data packets to it. If there is no neighboring node closer to $T a r g e t$ compared to the current node, the local optimal issue occurs. In this case, the current vehicle calculates a score for its neighbors using Equation (15) and sends the data packet to a node with the maximum score.

$\begin{matrix} S_{n e x t - h o p} = (1 - \frac{\sqrt{{(x_{c u r r e n t} - x_{n e x t - h o p})}^{2} + {(y_{c u r r e n t} - y_{n e x t - h o p})}^{2}}}{max_{j \in n e i g h b o r_{c u r r e n t}} (\sqrt{{(x_{c u r r e n t} - x_{j})}^{2} + {(y_{c u r r e n t} - y_{j})}^{2}})}) + \\ + (1 - \frac{D e l a y_{c u r r e n t, n e x t - h o p}}{max_{j \in n e i g h b o r_{c u r r e n t}} D e l a y_{c u r r e n t, j}}) + (\frac{C_{c u r r e n t, n e x t - h o p}}{max_{j \in n e i g h b o r_{c u r r e n t}} C T_{c u r r e n t, j}}) \end{matrix}$

(15)

where, $(x_{c u r r e n t}, y_{c u r r e n t})$ and $(x_{n e x t - h o p}, y_{n e x t - h o p})$ are the spatial coordinates of the current node and the next-hop node, respectively. $D e l a y_{c u r r e n t, n e x t - h o p}$ indicates the delay between the current node and the next-hop node. $C T_{c u r r e n t, n e x t - h o p}$ is the connection time between the current vehicle and the next-hop node. $n e i g h b o r_{c u r r e n t}$ is the number of neighbors of the current node. The pseudo-code of this process is described in Algorithm 3.

Algorithm 3 Local view based-routing process (V2V)

Input: $V_{S}$ : Source vehicle
$I n t e r s e c t_{S}$ : Source intersection
$V_{i}$ : Intermediate vehicle
$V_{D}$ : Destination vehicle
Output: Next-hop node
Begin

1:: if $V_{S}$ and $V_{D}$ are on the same road section then
2:: $T a r g e t = V_{D}$
3:: else if the the vehicle is $V_{S}$ then
4:: $T a r g e t = I n t e r s e c t_{S}$
5:: else if the vehicle is $V_{i}$ then
6:: $T a r g e t = I n t e r s e c t_{j}^{t + 1}$ that is obtained using Algorithm 2;
7:: end if
8:: if the vehicle is $V_{S}$ then
9:: $V_{S}$ : Choose the nearest next-hop node to $T a r g e t$ form its $T a b l e_{n e i g h b o r}$
10:: else if the vehicle is $V_{i}$ then
11:: $V_{i}$ : Choose the nearest next-hop node to $T a r g e t$ from its $T a b l e_{n e i g h b o r}$ ;
12:: end if
13:: if $V_{S}$ or $V_{i}$ cannot find the nearest next-hop node then
14:: $V_{S}$ or $V_{i}$ : Calculate $S_{n e x t - h o p}$ based on Equation 15 for all neighboring nodes in its $T a b l e_{n e i g h b o r}$ ;
15:: $V_{S}$ or $V_{i}$ : Select the vehicle with maximum $S_{n e x t - h o p}$ as the next-hop node;
16:: end if
17:: $V_{S}$ or $V_{i}$ : Forward the packet to the next-hop node;

End

V2I routing strategy: A data packet reaches the next intersection using V2V greedy forwarding strategy. Then, RSU at the intersection selects the next intersection using Q-table obtained from the Q-learning-based routing process described in Section 5.3 and sends the data packet to the corresponding road segment using the V2I greedy forwarding strategy. In this process, if the destination node is in this road segment, it is considered as $T a r g e t$ , and RSU sends the packet to the closest node to the destination vehicle. Otherwise, RSU sends the packet to the closest node to the next intersection, which is considered as $T a r g e t$ . If there is no vehicle for sending the packet to $T a r g e t$ , RSU carries this packet until it finds a suitable next-hop node. The pseudo-code of this process is described in Algorithm 4.

Algorithm 4 Local view based-routing process (V2I)

Input: $I n t e r s e c t_{i}$ : Intermediate intersection
$I n t e r s e c t_{D}$ : Destination intersection
$R S U_{i}$ : The RSU located in $I n t e r s e c t_{i}$
$R S U_{D}$ : The RSU located in $I n t e r s e c t_{D}$
$V_{i}$ : Intermediate vehicle
$V_{D}$ : Destination vehicle
Output: Next-hop node
Begin

1:: if the intersection is $I n t e r s e c t_{D}$ then
2:: $T a r g e t = V_{D}$
3:: else if the intersection is $I n t e r s e c t_{i}$ then
4:: $T a r g e t = I n t e r s e c t_{j}^{t + 1}$ that is obtained using Algorithm 2;
5:: end if
6:: if the intersection is $I n t e r s e c t_{D}$ then
7:: ${RSU}_{D}$ : Choose the nearest next-hop node to $T a r g e t$ from its $T a b l e_{n e i g h b o r}$ ;
8:: else if the intersection is $I n t e r s e c t_{i}$ then
9:: ${RSU}_{i}$ : Choose the nearest next-hop node to $T a r g e t$ from its $T a b l e_{n e i g h b o r}$ ;
10:: end if
11:: if $R S U_{D}$ or $R S U_{i}$ cannot find the nearest next-hop node then
12:: ${RSU}_{D}$ or ${RSU}_{i}$ : Store the data packet in its buffer;
13:: else
14:: ${RSU}_{D}$ or ${RSU}_{i}$ : Forward the data packet to the next-hop node;
15:: end if
16:: while the buffer queue of $R S U_{D}$ or $R S U_{i}$ is not empty do
17:: ${RSU}_{D}$ or ${RSU}_{i}$ : Check its $T a b l e_{n e i g h b o r}$ periodically;
18:: if $R S U_{D}$ or $R S U_{i}$ find a neighbor as the next-hop node then
19:: ${RSU}_{D}$ or ${RSU}_{i}$ : Forward the packet to the next-hop node;
20:: end if
21:: end while

End

6. Simulation and Evaluation of Results

In this section, the proposed method is implemented using the network simulator version 2 (NS2) to evaluate its performance. Note that NS2 is an event-driven simulation tool. It is useful for evaluating the dynamic nature of communication networks. It can be applied for implementing both wired and wireless network protocols such as routing algorithms. For more information, refer to [40]. Then, IRQ is compared with the three methods, namely IV2XQ [34], QGrid [33], and GPSR [32] in terms of packet delivery rate (PDR), the average end-to-end delay, the average hop count, and routing overhead. In this process, the size of the network environment is equal to 3 km × 3 km. This network includes 38 two-way road segments and 24 intersections. The density of vehicles varies in different road segments so that there are between 5–20 vehicles per kilometer. The total number of vehicles in the network is 450, and the speed of vehicles is equal to 14 m/s. The transmission radius of each vehicle is between 250–300 m, and the transmission radius of each RSU is 300 m. Additionally, the simulation time is 1000 s. The broadcast period of beacon messages is one second, and the traffic message is updated per 5 s. In the simulation process, the packet sending rate is equal to 1–6 packets/s, and the size of each packet is 512 bytes. In the Q-learning algorithm, the learning rate is

0.1

. In IRQ, the Q-learning algorithm utilizes the

ε

-greedy strategy in the exploration and exploitation process. In the simulation process, the probability of

ε

is

0.2

. The simulation parameters are summarized in Table 5.

6.1. Packet Delivery Rate

Package delivery rate (PDR) indicates the ratio of all packets received by the destination nodes to all packets sent by the source nodes. Two experiments are performed to evaluate the packet delivery rate. In the first experiment, the relationship between the packet delivery rate (PDR) and the packet sending rate (PSR) is evaluated. It is shown in Figure 4. According to this figure, there is a reverse relationship between PSR and PDR in all the routing methods so that PDR reduces when PSR increases, and vice versa. When the packet sending rate increases, the number of packets produced in the network increases. This rises network load, which fulfills the buffer capacity of vehicles in the network. This leads to packet loss and reduces the packet delivery rate. In the second experiment, the relationship between the packet delivery rate and the signal transmission radius (STR) of vehicles is analyzed. Figure 5 displays that there is a direct relationship between PDR and STR. This means that the packet delivery rate improves when the signal transmission radius of vehicles increases. When STR increases, vehicles can communicate with more vehicles in a wider communication range. As a result, the vehicles can find a more appropriate next-hop node for transferring data packets. This reduces the probability of trapping in the local optimum. According to Figure 4 and Figure 5, IRQ has the best packet delivery rate compared to other routing methods. In Figure 4, IRQ improves PDR by 4.29%, 19.67%, and 25.86% compared to IV2XQ, QGrid, and GPSR, respectively. Moreover, according to Figure 5, IRQ increases the packet delivery rate by 4%, 18.18%, and 21.87% compared to IV2XQ, QGrid, and GPSR, respectively. In IRQ, the best route for sending data packets is selected based on the fresh traffic information. In this scheme, the central server selects the best route between the intersections based on the density of vehicles, the average delay, and the average connection time on the road segments. In addition, the central server can detect and prevent congestion in the network. This improves the packet delivery rate on the network. In IV2XQ, the central server makes its routing decisions only based on the historical traffic information stored in its memory, and this information will never be updated. Moreover, IV2XQ only considers density information on the network to choose the optimal path. In addition, QGrid utilizes a greedy strategy to select the optimal grid and selects the grid with a maximum density as the optimal grid. However, too much density can cause congestion in the network and increase packet loss. GPSR also uses a greedy strategy on the road segments. This strategy can increase the probability of trapping in the local optimum, which leads to packet loss.

6.2. End-to-End Delay

The average end-to-end delay indicates the average time required to send packets from the source node to the destination node. In Figure 6, the end-to-end delay is evaluated based on the packet sending rate (PSR). As shown in this figure, when the packet sending rate is high, the end-to-end delay increases in all routing schemes because high PSR leads to network congestion. As a result, the waiting time for data packets in the buffer queue will be longer. This increases the delay in the data transmission process. In Figure 7, the end-to-end delay is also evaluated based on the signal transmission radius of vehicles. According to this figure, when the STR of vehicles is larger, the end-to-end delay is gradually reduced in all routing methods because the number of hops decreases in the routing path when increasing STR. This improves delay in the routing process. According to Figure 6 and Figure 7, IRQ has the minimum delay compared to other methods. In Figure 6, IRQ lowers delay by 8.05%, 21.57%, and 51.81% compared to IV2XQ, QGrid, and GPSR, respectively. Additionally, according to Figure 7, IRQ has improved the end-to-end delay by 18.52%, 32.25%, and 60.71% compared with IV2XQ, QGrid, and GPSR, respectively. Because, in IRQ, the central server allocates more reward to routes with low delay in the routing process between intersections. As a result, low-delay paths are selected for the data transmission process. Another reason is the use of a congestion control mechanism in IRQ. According to this mechanism, if the delay in one route is higher than a threshold, the central server penalizes this path so that it is not used for data transfer. Finally, the third reason is that in the V2V routing process, the node with less delay gains more chance to be selected as the next-hop node. As shown in Figure 6 and Figure 7, IV2XQ has an acceptable delay because it uses a congestion control mechanism. According to this mechanism, if the occupied space of the vehicle buffer exceeds a threshold in a road segment, an alternative route is selected for sending packets so that the data transmission process does not experience a high delay. However, QGrid has not designed any mechanism for controlling congestion on the network. This is the most important reason for increasing delay in this method. GPSR has also experienced the worst delay in comparison with other methods because of the local optimum problem.

6.3. Hop Count

The hop count represents the average intermediate nodes that should be traveled by data packet through the routing path between source node and destination node. Figure 8 shows the average hop counts based on the packet sending rate. According to this figure, the number of hops in the routing paths increases in all routing schemes when increasing PSR because network congestion increases in this case and affects the routing process. IRQ reduces the hop count by 14.35% and 5.89% compared to IV2XQ and GPSR, respectively. However, the performance of IRQ in terms of hop count is weaker than QGrid (approximately 4.32%). In Figure 9, the average hop count is evaluated based on the signal transmission radius. As shown in this figure, there is a reverse relationship between hop count and STR. This means when the signal transmission radius of vehicles is larger, the hop count in a route to reach the destination decreases. IRQ reduces the hop count by 21.50% and 8.46% compared to IV2XQ and GPSR, respectively. However, it has more hop count (almost 7.23%) than QGrid.

6.4. Communication Overhead

Communication overhead is defined as the ratio of failed data packets in the data transmission process and other packets used for discovering and maintaining routes to all packets produced in the network. Figure 10 displays the communication overhead based on the packet sending rate. Note that the communication overhead increases when the packet sending rate is high. Because when PSR is high, the collision probability will grow due to network congestion. As a result, packet loss has increased, and these packets must be re-transferred. As a result, it is necessary to form new paths on the network. It is associated with high communication overhead. Also in Figure 11, different methods are compared to each other in terms of routing overhead based on STR. As shown in this figure, there is a reverse relationship between routing overhead and STR, meaning that when the transmission radius is larger, the routing overhead reduces. This is rooted in the fact that a large transmission radius improves PDR. It reduces the need for re-transferring data packets, and consequently, lowers routing overhead. According to Figure 10, IRQ reduces the routing overhead by 36.26% and 46.79% compared to QGrid and GPSR, respectively. However, it has more routing overhead than IV2XQ (approximately 23.40%). Moreover, based on Figure 11, IRQ decreases routing overhead by 30.19% and 33.93% compared to QGrid and GPSR, respectively. However, it has more routing overhead than IV2XQ (almost 12.12%). This is because, in IV2XQ, the central server uses only the historical traffic information stored in its memory to discover routes between intersections in the network and does not exchange any control message in this process. As a result, the routing overhead of this method is very low. However, IRQ uses traffic messages to discover routes between intersections in the network. For this reason, the routing overhead of IRQ has increased compared to that of IV2XQ. GPSR also has the worst overhead because it suffers from a local optimum problem. In addition, QGrid has a weak performance in terms of routing overhead compared to IRQ. It is rooted in the lack of a congestion control mechanism. This increases packet loss in the network and increases the need for re-transferring data packets.

7. Conclusions

In this paper, an intersection-based routing method using Q-learning (IRQ) was suggested for vehicular ad hoc networks in the intelligent transportation system. IRQ is a hierarchical routing method, which uses two global and local views in the network. This method consists of three main steps: the dissemination mechanism of traffic information, the global view-based routing algorithm, and the local view-based routing algorithm. In the first phase, a dissemination mechanism of traffic information was introduced. This mechanism is responsible for creating global and local views in the network. In the second phase, a Q-learning-based routing technique was designed to find the best routes between intersections. Moreover, this phase also provided a congestion control mechanism so that the central server continuously evaluates the formed paths between nodes and penalizes routes with high congestion to improve the packet delivery rate. In the latest phase, IRQ used a greedy routing strategy based on a local view to find the best next-hop node. Finally, IRQ was simulated using NS2 software. Then, its performance was examined in terms of packet delivery rate, end-to-end delay, hop count, and routing overhead, and the results were compared with IV2XQ, QGrid, and GPSR. These results demonstrate the acceptable performance of IRQ in terms of packet delivery rate and delay. However, its communication overhead is higher than IV2XQ. In the future research direction, IRQ can be tested under further scenarios to better determine its advantages and disadvantages. In addition, it is attempted to reduce routing overhead and improve the network performance using clustering techniques.

Author Contributions

Conceptualization, M.H. and A.M.; methodology, M.H. and M.U.K.; validation, M.U.K.; investigation, M.H. and A.M.; resources, M.H., A.M. and M.U.K.; writing—original draft preparation, M.U.K.; supervision, M.H.; project administration, M.H. and A.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Al-Shareeda, M.A.; Anbar, M.; Hasbullah, I.H.; Manickam, S. Survey of authentication and privacy schemes in vehicular ad hoc networks. IEEE Sens. J. 2020, 21, 2422–2433. [Google Scholar] [CrossRef]
Rashid, S.; Khan, M.A.; Saeed, A.; Hamza, C.M. A Survey on Prediction based Routing for Vehicular Ad-hoc Networks. In Proceedings of the IEEE 2021 International Congress of Advanced Technology and Engineering (ICOTEN), Online, 4–5 July 2021; pp. 1–8. [Google Scholar] [CrossRef]
Ameur, A.I.; Lakas, A.; Bachir, Y.M.; Oubbati, O.S. Peer-to-peer overlay techniques for vehicular ad hoc networks: Survey and challenges. Veh. Commun. 2022, 34, 100455. [Google Scholar] [CrossRef]
Xia, Z.; Wu, J.; Wu, L.; Chen, Y.; Yang, J.; Yu, P.S. A comprehensive survey of the key technologies and challenges surrounding vehicular ad hoc networks. Acm Trans. Intell. Syst. Technol. (TIST) 2021, 12, 1–30. [Google Scholar] [CrossRef]
Shahwani, H.; Shah, S.A.; Ashraf, M.; Akram, M.; Jeong, J.P.; Shin, J. A comprehensive survey on data dissemination in Vehicular Ad Hoc Networks. Veh. Commun. 2021, 34, 100420. [Google Scholar] [CrossRef]
Gaurav, A.; Gupta, B.B.; Peñalvo, F.J.G.; Nedjah, N.; Psannis, K. Ddos attack detection in vehicular ad-hoc network (vanet) for 5g networks. In Security and Privacy Preserving for IoT and 5G Networks; Springer: Cham, Switzerland, 2022; pp. 263–278. [Google Scholar] [CrossRef]
Piper, J.; Rodger, J.A. Longitudinal Study of a Website for Assessing American Presidential Candidates and Decision Making of Potential Election Irregularities Detection. Int. J. Semant. Web Inf. Syst. (IJSWIS) 2022, 18, 1–20. [Google Scholar] [CrossRef]
Yang, H.; Vijayakumar, P.; Shen, J.; Gupta, B.B. A location-based privacy-preserving oblivious sharing scheme for indoor navigation. Future Gener. Comput. Syst. 2022, 137, 42–52. [Google Scholar] [CrossRef]
Jeong, H.; Lee, S.W.; Hussain Malik, M.; Yousefpoor, E.; Yousefpoor, M.S.; Ahmed, O.H.; Hosseinzadeh, M.; Mosavi, A. SecAODV: A secure healthcare routing scheme based on hybrid cryptography in wireless body sensor networks. Front. Med. 2022, 9, 829055. [Google Scholar] [CrossRef] [PubMed]
Rahmani, A.M.; Ali, S.; Yousefpoor, E.; Yousefpoor, M.S.; Javaheri, D.; Lalbakhsh, P.; Ahmed, O.H.; Hosseinzadeh, M.; Lee, S.W. OLSR+: A new routing method based on fuzzy logic in flying ad-hoc networks (FANETs). Veh. Commun. 2022, 36, 100489. [Google Scholar] [CrossRef]
Abdel-Halim, I.T.; Fahmy, H.M.A. Prediction-based protocols for vehicular Ad Hoc Networks: Survey and taxonomy. Comput. Netw. 2018, 130, 34–50. [Google Scholar] [CrossRef]
Grover, J. Security of Vehicular Ad Hoc Networks using blockchain: A comprehensive review. Veh. Commun. 2022, 34, 100458. [Google Scholar] [CrossRef]
Yousefpoor, M.S.; Yousefpoor, E.; Barati, H.; Barati, A.; Movaghar, A.; Hosseinzadeh, M. Secure data aggregation methods and countermeasures against various attacks in wireless sensor networks: A comprehensive review. J. Netw. Comput. Appl. 2021, 190, 103118. [Google Scholar] [CrossRef]
Tripp-Barba, C.; Zaldívar-Colado, A.; Urquiza-Aguiar, L.; Aguilar-Calderón, J.A. Survey on routing protocols for vehicular ad hoc networks based on multimetrics. Electronics 2019, 8, 1177. [Google Scholar] [CrossRef] [Green Version]
Senouci, O.; Harous, S.; Aliouat, Z. Survey on vehicular ad hoc networks clustering algorithms: Overview, taxonomy, challenges, and open research issues. Int. J. Commun. Syst. 2020, 33, e4402. [Google Scholar] [CrossRef]
Ardakani, S.P.; Kwong, C.F.; Kar, P.; Liu, Q.; Li, L. CNN: A Cluster-Based Named Data Routing for Vehicular Networks. IEEE Access 2021, 9, 159036–159047. [Google Scholar] [CrossRef]
Nazib, R.A.; Moh, S. Routing protocols for unmanned aerial vehicle-aided vehicular ad hoc networks: A survey. IEEE Access 2020, 8, 77535–77560. [Google Scholar] [CrossRef]
Aggarwal, A.; Gaba, S.; Nagpal, S.; Vig, B. Bio-Inspired Routing in VANET. In Cloud and IoT-Based Vehicular Ad Hoc Networks; Wiley: Hoboken, NJ, USA, 2021; pp. 199–220. [Google Scholar] [CrossRef]
Ramamoorthy, R.; Thangavelu, M. An enhanced distance and residual energy-based congestion aware ant colony optimization routing for vehicular ad hoc networks. Int. J. Commun. Syst. 2022, 35, e5179. [Google Scholar] [CrossRef]
Liu, J.; Weng, H.; Ge, Y.; Li, S.; Cui, X. A Self-Healing Routing Strategy based on Ant Colony Optimization for Vehicular Ad Hoc Networks. IEEE Internet Things J. 2022. [Google Scholar] [CrossRef]
Nazib, R.A.; Moh, S. Reinforcement learning-based routing protocols for vehicular ad hoc networks: A comparative survey. IEEE Access 2021, 9, 27552–27587. [Google Scholar] [CrossRef]
Mchergui, A.; Moulahi, T.; Zeadally, S. Survey on artificial intelligence (AI) techniques for vehicular ad-hoc networks (VANETs). Veh. Commun. 2021, 34, 100403. [Google Scholar] [CrossRef]
Vijayakumar, P.; Rajkumar, S.C. Deep Reinforcement Learning-Based Pedestrian and Independent Vehicle Safety Fortification Using Intelligent Perception. Int. J. Softw. Sci. Comput. Intell. (IJSSCI) 2022, 14, 1–33. [Google Scholar] [CrossRef]
Rahmani, A.M.; Ali, S.; Yousefpoor, M.S.; Yousefpoor, E.; Naqvi, R.A.; Siddique, K.; Hosseinzadeh, M. An area coverage scheme based on fuzzy logic and shuffled frog-leaping algorithm (sfla) in heterogeneous wireless sensor networks. Mathematics 2021, 9, 2251. [Google Scholar] [CrossRef]
Lee, S.W.; Ali, S.; Yousefpoor, M.S.; Yousefpoor, E.; Lalbakhsh, P.; Javaheri, D.; Rahmani, A.M.; Hosseinzadeh, M. An energy-aware and predictive fuzzy logic-based routing scheme in flying ad hoc networks (fanets). IEEE Access 2021, 9, 129977–130005. [Google Scholar] [CrossRef]
Rahmani, A.M.; Ali, S.; Malik, M.H.; Yousefpoor, E.; Yousefpoor, M.S.; Mousavi, A.; Hosseinzadeh, M. An energy-aware and Q-learning-based area coverage for oil pipeline monitoring systems using sensors and Internet of Things. Sci. Rep. 2022, 12, 9638. [Google Scholar] [CrossRef]
Ji, X.; Xu, W.; Zhang, C.; Yun, T.; Zhang, G.; Wang, X.; Wang, Y.; Liu, B. Keep forwarding path freshest in VANET via applying reinforcement learning. In Proceedings of the 2019 IEEE First International Workshop on Network Meets Intelligent Computations (NMIC), Dallas, TX, USA, 7–9 July 2019; pp. 13–18. [Google Scholar] [CrossRef]
Saravanan, M.; Ganeshkumar, P. Routing using reinforcement learning in vehicular ad hoc networks. Comput. Intell. 2020, 36, 682–697. [Google Scholar] [CrossRef]
Wu, J.; Fang, M.; Li, H.; Li, X. RSU-assisted traffic-aware routing based on reinforcement learning for urban vanets. IEEE Access 2020, 8, 5733–5748. [Google Scholar] [CrossRef]
Yang, X.; Zhang, W.; Lu, H.; Zhao, L. V2V routing in VANET based on heuristic Q-learning. Int. J. Comput. Commun. Control. 2020, 15, 1–17. [Google Scholar] [CrossRef]
Wu, C.; Yoshinaga, T.; Bayar, D.; Ji, Y. Learning for adaptive anycast in vehicular delay tolerant networks. J. Ambient. Intell. Humaniz. Comput. 2019, 10, 1379–1388. [Google Scholar] [CrossRef]
Karp, B.; Kung, H.T. GPSR: Greedy perimeter stateless routing for wireless networks. In Proceedings of the 6th Annual International Conference on Mobile Computing and Networking, Boston, MA, USA, 6–11 August 2000; pp. 243–254. [Google Scholar] [CrossRef]
Li, F.; Song, X.; Chen, H.; Li, X.; Wang, Y. Hierarchical routing for vehicular ad hoc networks via reinforcement learning. IEEE Trans. Veh. Technol. 2018, 68, 1852–1865. [Google Scholar] [CrossRef]
Luo, L.; Sheng, L.; Yu, H.; Sun, G. Intersection-based V2X routing via reinforcement learning in vehicular Ad Hoc networks. IEEE Trans. Intell. Transp. Syst. 2021, 23, 5446–5459. [Google Scholar] [CrossRef]
Padakandla, S. A survey of reinforcement learning algorithms for dynamically varying environments. Acm Comput. Surv. (CSUR) 2021, 54, 1–25. [Google Scholar] [CrossRef]
Qiang, W.; Zhongli, Z. Reinforcement learning model, algorithms and its application. In Proceedings of the IEEE 2011 International Conference on Mechatronic Science, Electric Engineering and Computer (MEC), Jilin, China, 19–22 August 2011; pp. 1143–1146. [Google Scholar] [CrossRef]
Rezwan, S.; Choi, W. A survey on applications of reinforcement learning in flying ad-hoc networks. Electronics 2021, 10, 449. [Google Scholar] [CrossRef]
Al-Rawi, H.A.; Ng, M.A.; Yau, K.L.A. Application of reinforcement learning to routing in distributed wireless networks: A review. Artif. Intell. Rev. 2015, 43, 381–416. [Google Scholar] [CrossRef]
Rahmani, A.M.; Yousefpoor, E.; Yousefpoor, M.S.; Mehmood, Z.; Haider, A.; Hosseinzadeh, M.; Ali Naqvi, R. Machine learning (ML) in medicine: Review, applications, and challenges. Mathematics 2021, 9, 2970. [Google Scholar] [CrossRef]
Issariyakul, T.; Hossain, E. Introduction to network simulator 2 (NS2). In Introduction to Network Simulator NS2; Springer: Boston, MA, USA, 2009; pp. 1–18. [Google Scholar] [CrossRef]

Figure 1. A vehicular ad hoc network.

Figure 2. Reinforcement learning process.

Figure 4. Comparison of PDR in different routing methods based on PSR.

Figure 5. Comparison of PDR in different routing methods based on STR.

Figure 6. Comparison of delay in different routing methods based on PSR.

Figure 7. Comparison of delay in different routing methods based on STR.

Figure 8. Comparison of the hop count in different routing methods based on the PSR.

Figure 9. Comparison of the hop count in different routing methods based on STR.

Figure 10. Comparison of routing overhead in different methods based on PSR.

Figure 11. Comparison of routing overhead in different methods based on STR.

Table 1. The advantages and disadvantages of the related works.

Scheme	Advantages	Disadvantages
RHR [27]	Decreasing broadcast storms, using an adaptive broadcast technique by predicting the position and movement of vehicles, suitable for rural areas	Not describing how to select a fixed number of neighbors, not considering a recovery process in sparse network conditions
VRDRT [28]	Improving the performance of the routing process by DRL, predicting road traffic conditions, reducing delay in the data transmission process	Not considering a suitable method for calculating the density of vehicles on the road, depending significantly on RSUs, not suitable for highways and urban areas
QTAR [29]	Using Q-learning to improve PDR and throughput, presenting a traffic-aware routing method, high reliability, reducing end-to-end delay	Implementation capability only for urban areas, not estimating the movement direction of vehicles on the roads
HQVR [30]	Determining the learning rate based on the link quality, increasing packet delivery rate, reducing the effect of the node mobility on the convergence speed of Q-learning algorithm, low dependence on infrastructure (RSUs)	High dependence of Q-learning algorithm to beacon messages, slow convergence speed of the learning algorithm, applying an exploration technique based on a specific probability
QVDRP [31]	High delay-tolerant, increasing packet delivery rate, reducing the number of duplicated control messages, considering relative velocity of vehicles	Slow convergence speed of the learning algorithm
GPSR [32]	Reducing routing overhead, reducing delay in the network	Not considering parameters such as speed, movement direction, and link lifetime in the routing process
QGrid [33]	Reducing the number of states in Q-learning algorithm, appropriate convergence speed, reducing communication overhead, determining the discount factor based on vehicle density	Designing an off-line routing, not designing a congestion control mechanism in the network, fixing Q-table during the simulation process, not considering the effect of intersections and buildings on the transmission quality in each grid, not considering parameters such as speed, movement direction, and link lifetime in the routing process
IV2XQ [34]	Determining the discount factor based on the density and distance of vehicles on the road, reducing communication overhead, designing a congestion control mechanism, appropriate convergence speed, reducing the number of states in the Q-learning algorithm	Not considering parameters such as speed, movement direction, and link lifetime in the routing process, not relying on new traffic information in the network

Table 2. Neighborhood table format.

Vehicle ID	Road ID	Spatial Coordinates	Velocity	Connection Time	Delay	Validity Time
$I D_{V_{j}}$	$I D_{R}$	$(x_{j}^{t}, y_{j}^{t})$	$(v_{x, j}^{t}, v_{y, j}^{t})$	$C T_{i, j}^{t}$	$D e l a y_{i, j}$	$V T_{j}$

Table 3. Traffic table format.

Road ID	Vehicle Density	Average Connection Time	Average Delay	Validity Time
$R_{u p}$	The number of vehicles on the upper road	Average connection time at the upper road	Average delay at the upper road	$V T_{R_{u p}}$
$R_{d o w n}$	The number of vehicles on the down road	Average connection time at the down road	Average delay at the down road	$V T_{R_{d o w n}}$
$R_{l e f t}$	The number of vehicles on the left road	Average connection time at the left road	Average delay at the left road	$V T_{R_{l e f t}}$
$R_{r i g h t}$	The number of vehicles on the right road	Average connection time at the right road	Average delay at the right road	$V T_{R_{r i g h t}}$

Table 4. The format of traffic information packet.

ID of RSU	ID of Intersection		Time to Live
$R_{u p}$	The vehicle density on the upper road	Average connection time at the upper road	Average delay at the upper road
$R_{d o w n}$	The vehicle density on the down road	Average connection time at the down road	Average delay at the down road
$R_{l e f t}$	The vehicle density on the left road	Average connection time at the left road	Average delay at the left road
$R_{r i g h t}$	The vehicle density on the right road	Average connection time at the right road	Average delay at the right road

Table 5. Simulation parameters.

Parameter	Value
Simulator	NS2
Simulation environment (km²)	$3 \times 3$
Simulation time (second)	1000
Total number of vehicles	450
Number of road segments	38
Number of intersections	24
Vehicle density (vehicles/m)	0.005–0.02
The velocity of vehicles (m/s)	14
Transmission radius of vehicles (m)	250–300
Transmission radius of RSUs (m)	300
Packet size (byte)	512
Packet sending rate (packets/s)	1–6
Beacon broadcast interval (second)	1
Traffic broadcast interval (second)	5
Learning rate ( $α$ )	0.1
Probability of $ε$	0.2

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Khan, M.U.; Hosseinzadeh, M.; Mosavi, A. An Intersection-Based Routing Scheme Using Q-Learning in Vehicular Ad Hoc Networks for Traffic Management in the Intelligent Transportation System. Mathematics 2022, 10, 3731. https://doi.org/10.3390/math10203731

AMA Style

Khan MU, Hosseinzadeh M, Mosavi A. An Intersection-Based Routing Scheme Using Q-Learning in Vehicular Ad Hoc Networks for Traffic Management in the Intelligent Transportation System. Mathematics. 2022; 10(20):3731. https://doi.org/10.3390/math10203731

Chicago/Turabian Style

Khan, Muhammad Umair, Mehdi Hosseinzadeh, and Amir Mosavi. 2022. "An Intersection-Based Routing Scheme Using Q-Learning in Vehicular Ad Hoc Networks for Traffic Management in the Intelligent Transportation System" Mathematics 10, no. 20: 3731. https://doi.org/10.3390/math10203731

APA Style

Khan, M. U., Hosseinzadeh, M., & Mosavi, A. (2022). An Intersection-Based Routing Scheme Using Q-Learning in Vehicular Ad Hoc Networks for Traffic Management in the Intelligent Transportation System. Mathematics, 10(20), 3731. https://doi.org/10.3390/math10203731

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Intersection-Based Routing Scheme Using Q-Learning in Vehicular Ad Hoc Networks for Traffic Management in the Intelligent Transportation System

Abstract

1. Introduction

2. Related Works

3. Basic Concepts

4. Network Model

5. Proposed Method

5.1. Dissemination Mechanism of Traffic Information

5.1.1. Calculating the Connection Time of Two Vehicles

5.1.2. Calculating the Delay between Two Vehicles

5.2. Dissemination of Traffic State Information

5.3. Global View-Based Routing Algorithm

5.4. Local View-Based Routing Algorithm

6. Simulation and Evaluation of Results

6.1. Packet Delivery Rate

6.2. End-to-End Delay

6.3. Hop Count

6.4. Communication Overhead

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI