An Intelligent Congestion Control Strategy in Heterogeneous V2X Based on Deep Reinforcement Learning

Wang, Hui; Li, Haoyu; Zhao, Yuan

doi:10.3390/sym14050947

Open AccessArticle

An Intelligent Congestion Control Strategy in Heterogeneous V2X Based on Deep Reinforcement Learning

by

Hui Wang

^1,*,

Haoyu Li

² and

Yuan Zhao

¹

School of Computer Science and Engineering, Xi’an Technological University, Xi’an 710021, China

²

Faculty of Transportation Engineering, Kunming University of Science and Technology, Kunming 650500, China

^*

Author to whom correspondence should be addressed.

Symmetry 2022, 14(5), 947; https://doi.org/10.3390/sym14050947

Submission received: 23 March 2022 / Revised: 26 April 2022 / Accepted: 27 April 2022 / Published: 6 May 2022

Download

Browse Figures

Versions Notes

Abstract

:

High mobility and the complexity of mobile behavior are the main characteristics of nodes in Vehicle to Everything (V2X). Furthermore, these characteristics entail that resource deployment cannot effectively meet the demands of users for differentiated service quality. Due to this significance, the main objective of this study is to propose an intelligent congestion control strategy based on deep reinforcement learning (ICCDRL) in heterogeneous V2X, which can meet the diverse service needs of vehicles to some extent, so as to solve the problem of network congestion effectively. The proposal is implemented through three aspects: Firstly, the paper establishes a congestion control model based on DRL. Secondly, a large amount of QoS data is used as the training set to optimize the model. Finally, the congestion sensitivity factor is used to select the size of the congestion window for the next moment, resulting in an intelligent congestion control strategy based on QoS on-demand drive. For verification, a series of simulation experiments are designed on the ns-3 simulation platform. The results show that the proposed ICCDRL outperforms the traditional algorithm in terms of throughput, convergence, friendliness and fairness, and can effectively guarantee real-time, reliable information interaction in V2X.

Keywords:

heterogeneous V2X; deep reinforcement learning (DRL); intelligent congestion control; congestion sensitive factor; QoS

1. Introduction

Vehicle to Everything (V2X) is a ubiquitous network that realizes a comprehensive interconnection of everything, from Vehicle to Vehicle (V2V), Vehicle to Infrastructure (V2I), Vehicle to Pedestrian (V2P) and Vehicle to Network (V2N) [1]. In the 5G network environment, the Internet of Vehicles involves five major elements: people, vehicles, roads, network connections and service platforms. Moreover, V2X is a complex heterogeneous network with rapid expansion of communication service data and coexistence of cellular network and Internet of Vehicles.

Dedicated Short Range Communication (DSRC) and Cellular Vehicle to Everything (C-V2X) are the two main technical standards for V2X communication [2]. DSRC implements V2X communication through a series of communication protocols, including the IEEE 802.11p protocol and the IEEE 1609.1.4 protocol [3]. The former supports wireless access to the in-vehicle environment, while the latter supports network services and multi-channel operation. Moreover, DSRC can efficiently form dynamic in-vehicle networks that are better adapted to vehicle mobility characteristics, enabling dynamic and stable connections between fast-moving vehicles and roadside facilities. C-V2X is a cellular network-based in-vehicle communication technology proposed by the 3GPP organization, which can realize vehicle communication with the help of cellular networks such as Long-Term Evolution (LTE) and 5G, and has the advantages of wide coverage, high capacity, low latency and high quality of service [4].

DSRC originated from the IEEE 802.11 standard, which was originally designed for low mobility wireless LANs with drawbacks such as a low data rate, limited coverage and a low quality of service. The IEEE802.11p protocol uses the Carrier Sense Multiple Access (CSMA) mechanism in congestion scenarios, which is prone to frequent collisions, leading to a sharp deterioration in network performance. Although the use of DSRC technology enables vehicles to form dynamic self-organizing networks, the high-speed mobility of nodes increases the instability of network topological connections, which affects functions such as routing and addressing [5], and further causes network congestion. In addition, there are many obstacles affecting the line-of-sight in cities, which greatly degrade the low DSRC communication performance [6], and the use of DSRC technology in some harsh driving environments entails significant safety risks. Some scholars have proposed the use of heterogeneous communication access technology combining DSRC and C-V2X in V2X networks to solve the problem that the communication process of V2X cannot adapt to high-speed mobility [7,8,9,10].

With the increasing number of motor vehicles, roads are becoming more and more congested. When the density of vehicles within the communication range reaches a threshold value, congestion is easily triggered. At this time, frequent information collisions occur, making it impossible to broadcast safety information in a timely manner, and network performance deteriorates dramatically, leading to frequent traffic accidents. Therefore, how to solve congestion control in vehicle communications is a problem worthy of attention and research. Here, the Transmission Control Protocol (TCP), as the key technology of network congestion control, plays a crucial role in V2X. In addition, the variation trend of the size of the congestion window presents a law of similar normal distribution curve, which has a certain degree of local symmetry.

The topology of heterogeneous V2X is complex. The high-speed mobility of vehicles and the dynamic nature of network topology make the distribution of computing resources uneven in several spatio-temporal dimensions such as communication traffic, channel information, node density and network environment. In short, the communication process of V2X cannot adapt to the characteristics of the high-speed mobility of vehicles and dynamic changes of data, which eventually leads to low utilization and a poor flexibility of resource allocation in the V2X network. Therefore, to cope with this severe challenge, it is necessary to deeply fuse V2X massive heterogeneous data, comprehensively analyze communication behavior, and design congestion control algorithms suitable for vehicle communication to alleviate network congestion and provide efficient network services [11].

The applications of V2X are diversified, and its communication demands are also diversified at the same time [12,13,14,15]. For different types of applications, the importance of information (system broadcast information, safety application information, traffic management-type application information and entertainment application information, etc.) in vehicle communications is different, which leads to different requirements for network performance parameters during transmission. Moreover, Quality of Service (QoS) plays a crucial role in influencing the different services in V2X.

Currently, most traditional congestion control algorithms consider how to utilize bandwidth or delay more efficiently. In recent years, scholars at home and abroad have done a lot of in-depth research on DRL-based V2X congestion control and proposed various algorithms [16,17,18], however, these algorithms do not combine different service requirements such as safety, convenience and entertainment in vehicular networking and QoS to control network congestion in V2X. This makes it difficult to meet the diversified service requirements of large-scale vehicle users.

This paper proposes an intelligent congestion control strategy in heterogeneous V2X based on deep reinforcement learning (ICCDRL) to address the diverse needs of network applications for connected vehicle end devices. We summarize the major contributions of this work as follows.

Firstly, through an analysis of the heterogeneous V2X network architecture, a congestion control model based on deep reinforcement learning (DRL) is established using Markov’s random memoryless property;
Secondly, we obtain the QoS parameters of different services in vehicle communications, calculate the minimum cost according to the importance of the service, and define the overhead weights and congestion sensitivity factor $λ_{O D L}$ according to the different importance of services;
In addition, by observing the current network state information, using the deep reinforcement learning PPO2 algorithm to learn from historical experience, and with the help of congestion-sensitive factors, a large amount of historical QoS data is used as a training set to optimize actions by combining the current state information of the network and selecting the congestion window size at the next moment. Thus, an intelligent congestion control strategy driven by QoS on-demand is formed;
Finally, we build the ns-3 simulation platform to verify the performance of the ICCDRL proposed in the paper.

The paper consists of five chapters and is summarized below: First, related works are discussed in more detail in Section 2. Then, in Section 3, we depict the construction of the Model of Intelligent Congestion Control for Heterogeneous Vehicle Networking Based on DRL. Following this, the details of the ICCDRL technology are introduced in the same Section. Section 4 describes the design and the process of the experiments in detail before introducing and analyzing the high performance and convergence of the ICCDRL with specific information. Finally, the paper is summarized in Section 5.

2. Research Background and Related Works

Heterogeneous V2X is based on VANETs, WLAN, LTE and 5G networks, centered on cloud computing and cloud storage platforms, and assisted by distributed nodes (base stations, roadside units, vehicles and mobile devices, etc.) deployed at the edge of the network. Finally, an intelligent heterogeneous network with edge-cloud collaboration is formed, as shown in Figure 1. The paper discusses in detail the congestion control in heterogeneous V2X based on this architecture.

In recent years, the level of car ownership is rapidly increasing, yet the increment of road traffic facilities is very limited, and the consequent problem is increasingly serious traffic congestion. Moreover, the high density of vehicles is very likely to lead to channel congestion, which seriously affects the efficiency of information dissemination. Therefore, scholars at home and abroad have conducted a lot of research on the congestion control problem in vehicle networks.

Congestion control has been a hot topic in network transmission research. Algorithms such as NewReno [19], Cubic [20], Vegas [21] and Westwood [22] are traditional TCP algorithms. Among them, NewReno and Cubic are suitable for traditional Internet network environments, use the loss of packets to judge whether there is congestion, and reduce the congestion window. Vegas uses delay to determine whether there is congestion. Westwood predicts the transmission capacity of the link to determine whether it is congested, determines the transmission speed according to the number of returned ACK packets, and further adjusts the window size and slow start threshold.

In recent years, with the rapid development of the Internet of Vehicles (IoV), scholars at home and abroad have conducted a lot of research on congestion control of the IoV. Based on the designed information delivery rate model, the literature [23] proposes a distributed control strategy for IoV channel congestion to ensure the efficiency of basic safety information dissemination. For congestion control at the MAC layer in the IoV, the literature [24] proposes a congestion control strategy based on a forbidden search algorithm to determine whether to initiate congestion control by comparing with the threshold value. Recent experiments by Tan et al. in the literature [25] have proposed a distributed congestion control strategy based on network utility maximization theory, named Utility-Based Rate Congestion Control (UBRCC). This algorithm obtains the packet sending rate by updating the “price” of vehicle congestion, and finally realizes the channel allocation for the safety requirements of an individual vehicle.

Previous work on congestion control has indicated that these TCP algorithms are mainly packet loss feedback-based protocols, delay feedback-based protocols and link capacity-based protocols. The mechanism of each of these protocols is to use some defined set of rules to adjust the size of the congestion window. However, due to the significant characteristics of vehicle communications, namely complexity and high-speed mobility, it is obvious that the algorithms above cannot be well-adapted to vehicle communications. Therefore, researchers have proposed a TCP protocol based on learning. Compared with the other three types of traditional TCP protocols, TCP protocols based on learning can better adapt to the network state, and actively learn the network environment parameters to form more comprehensive TCP congestion control rules.

Due to the powerful learning capability of deep reinforcement learning (DRL), and with advantages such as autonomous decision making, it can be used to solve problems related to the large state space and action space in vehicular networking, making great progress in areas such as unmanned driving [26,27,28], optimal scheduling, intelligent decision making and gaming. DRL has been of considerable interest to the congestion control community in recent years. In the literature [29], a DRL-based distributed resource allocation algorithm has been proposed for shop floor communication, supporting both unicast and broadcast scenarios. Each vehicle is treated as an intelligent body that makes autonomous decisions based on the local information of the channel. In the literature [30], a DRL-based resource allocation algorithm has been proposed for the joint optimization problem of transmission mode selection and resource allocation in C-V2X to maximize the total throughput of V2I links while guaranteeing the delay and reliability of the V2V link.

Frequent network switching during the fast movement of high-speed trains is the problem that has received the most attention in the [31] literature. This study applied deep reinforcement learning to the HSR network and proposed the Hd-TCP algorithm, thus addressing the situation of poor experiences due to this problem. Recent experiments in the literature [32] proposed a multi-channel intelligent access method based on deep reinforcement learning. Another study [33] proposed an adaptive online decision-making method by using SDN and deep reinforcement learning algorithms to compute the size of the congestion window and learn the optimal policy in a stable and fast manner. For disaster 5G mm Wave networks with highly mobile UAVs, a DL-TCP algorithm based on deep learning has been proposed in the literature [34], which learns the movement information and signal strength of nodes and adjusts the TCP congestion window by predicting the network disconnection and reconnection times. For Named Data Networks (NDNs), a study [35] proposed a deep reinforcement learning congestion control protocol (DRL-CCP) based on deep reinforcement learning, where the algorithm automatically learns the optimal congestion control strategy from historical congestion control experience. Another study [36] proposed an intelligent congestion control algorithm, TCP-Drinc, based on deep reinforcement learning, which is a model-free algorithm. Moreover, TCP-Drinc was able to adapt to complex and dynamic network environments by learning from past experiences and finally deciding how to adjust the size of the congestion window based on a certain set of measured feature values.

Obviously, great progress has been made in congestion control, however, for its application in V2X, no clear advancement has been seen so far. Therefore, in the heterogeneous V2X network architecture of “vehicle-human-road-cloud”, how to design an intelligent congestion control strategy will be the key to achieving high-capacity and high-efficiency communication for intelligent, dense and complex heterogeneous communication scenarios, and differentiated and high-quality communication service requirements. In view of this, the paper aims to introduce DRL into the congestion control of vehicle communications to meet the diversified and differentiated needs of V2X services and effectively ensure real-time and reliable information interaction in V2X.

3. Intelligent Congestion Control Model Based on DRL in Heterogeneous V2X

For heterogeneous V2X, due to the main features such as the complexity and diversity of network types, the high-speed mobility of nodes, and the dynamic nature of network topology, frequent switching may lead to poor user experience when nodes are connected to the network. In addition, the diversification and redundancy of transmission paths will cause changes of QoS parameters such as throughput, delay and packet loss in different IoV services, and these changes may degrade the transmission quality and eventually lead to network congestion. QoS parameters are closely related to network services. Moreover, the actual sampling value of QoS can be used to describe the actual operation status of the network more objectively, and to reflect the support degree of the network for the service from a global perspective.

Therefore, for a specific service cost in V2X, DRL is used to realize the congestion control of vehicle communications, combining the perception ability of DL with the decision-making ability of RL. RL is used to define the optimization goal, and DL provides the strategy operation mechanism, so as to design and implement intelligent congestion control based on QoS that is on-demand-driven, maximizes network operation efficiency, and ensures the requirements of low latency and high reliability. The overall framework of the ICCDRL algorithm proposed in the paper is shown in Figure 2.

In the network environment of vehicle communications, the ICCDRL takes the observed QoS parameters, including the number of ACK packets, round-trip delay RTT, throughput and packet loss rate, as the state space. By observing the state information in the environment, the policy function required by the Agent is constructed and the optimal control action is generated at the same time.

Obviously, the Agent contains information about the current state of the network, the policy function based on the neural network, and the obtained action space. The Agent adjusts the size of the congestion window based on the actions output by the policy function and optimizes the performance of V2X. After generating actions and interacting with the environment, the Agent gains rewards from the environment. The Agent judges the merit of the selected action based on the reward value and updates the parameters of the policy function so that the policy function can generate actions that harvest more reward values.

According to the definition of the reward function, the optimal congestion control action is selected to adjust the size of the congestion window, so as to maximize the long-term reward received and realize intelligent congestion control. Moreover, agent and environment continuously interact with each other and try to find a balance between throughput, latency and packet loss, exploring the optimal policy and giving feedback by taking successive Actions (e.g., changing

c w n d

) to achieve their desired goals, i.e., large throughput and low latency.

3.1. Basic Model

The network performance parameters obtained by a user in V2X are only related to the current network state and the current selected action, and have nothing to do with the previous network state, which conforms to Markov characteristics. Therefore, this section uses Markov’s random memoryless property to establish a congestion control model based on DRL to describe the changing process of congestion control in the Internet of Vehicles. The above congestion control process is defined as the Markov decision process

{S, A, R, P, γ}

. The specific definition is as follows:

State Space

S

: The set of network states

s_{t}

at the current time

t

, and

s_{t} \in S

; the initial state is

s_{0}

.

Action space

A

: The set of actions

a_{t}

that can be performed at any moment

t

, and

a_{t} \in A

.

Transition probability

P

: The probability

P (s_{t + 1} | s_{t}, a_{t})

that the network state changes from

s_{t}

to the next state

s_{t + 1}

when a node takes some action

a_{t}

.

Reward

R

: When a node takes an action

a_{t}

, the network state changes from

s_{t}

to the following state

s_{t + 1}

and the reward

r_{t}

is obtained, which is represented by

R (s_{t}, a_{t}) = E [R_{t + 1} | s_{t}, a_{t}]

.

Discount factor

γ

: The discount factor determines the importance of the reward at the next moment and the reward at the current moment, and

γ \in [0, 1)

. It shows that the reinforcement learning algorithm prioritizes the immediate reward value, while the future reward value will decay in a certain proportion. When

γ

is close to 1, it means that the rewards obtained at the current moment have more weight; on the contrary, when

γ

is close to 0, it means that the rewards obtained at future moments are more important.

The RDL algorithm starts from the initial state

s_{0}

, selects action

a_{t}

according to the strategy function

π (a_{t} | s_{t})

at the current observed state

s_{t}

, reaches the new state

s_{t + 1}

according to the state transition probability

P

, and obtains reward

r_{t + 1}

from the environment. Moreover, the goal of reinforcement learning is to optimize the strategy function in order to maximize the expected value of reward, which is defined as Formula (1).

\hat{R} = r_{0} + γ r_{1} + \dots + γ^{t} r_{t} = \sum_{t = 0}^{T} γ^{t} r_{t}

(1)

In this paper, DRL is used as a common method to solve the Markov decision process mentioned above. By interacting with the environment in a trial-and-error way, we try to maximize the cumulative reward function to accelerate the design of the neural network, and finally obtain the optimal strategy.

3.2. Design of State Space

To implement an efficient reinforcement learning algorithm, the key is to choose a reasonable state space. Only by selecting enough information for training can the ICCDRL algorithm choose the appropriate action. However, too much state information will greatly increase the amount of calculation and slow down the learning speed. In view of this, the parameters selected in the state space

s_{t}

of this paper include the size of the congestion window, the number of ACK packets fed back, round-trip time

R T T

, throughput, and packet loss rate. It can be expressed as

s_{t} = {c w n d, A C K, R T T, t h r o u g h p u t, P_{x}}

.

3.2.1. Size of The Congestion Window

Assuming that the link speed between two nodes in the V2X network is

C

(Kbps) and the packet size is

η

(Kbits), and assuming that the round-trip delay

R T T

(S) of the link is constant, the maximum size of the congestion window is defined as Formula (2)

c w n d_{m a x} = \frac{B D P}{η}

(2)

where

B D P

is the link delay bandwidth product, and

B D P = C \times R T T

.

Buffer size plays an important role in expanding link capacity. First, if the size of the send window is 100 packets, that means the sender will allow up to 100 packets to be sent. Moreover, assume that the link buffer size is 20 packets, which means that the maximum number of packets allowed to be transmitted is 120. Therefore, Formula (2) can be modified into Formula (3)

c w n d_{m a x} \approx \frac{B D P}{η} + b

(3)

where

b

represents the link buffer size, and

b

is constant.

3.2.2. Number of ACK Packets Fed Back

If the sender receives three consecutive duplicate ACK packets, the sender determines that the packets are lost and resends the packets immediately. Obviously, this parameter indirectly reflects the state of network congestion. Receiving three duplicate ACK packets indicates that network congestion occurs, and the size of the congestion window should be maintained or reduced.

3.2.3. Round-Trip Time $R T T$

R T T

directly reflects network congestion. If the network is congested,

R T T

increases significantly. The congestion control algorithm can adjust the size of the congestion window according to the delay.

3.2.4. Throughput

Throughput represents the number of bytes acknowledged per second by the receiver.

3.2.5. Packet Loss Rate

Packet loss in a V2X network can be caused by congestion or random packet loss. Congestion occurs when the transmission rate V reaches the maximum capacity of the bottleneck link

C

or reaches the maximum window size

c w n d_{m a x}

, resulting in packet loss, or congestion loss occurs when the buffer overflows. Random packet loss can be caused by signal collision, interference or attenuation in a V2X network. In addition, the random loss of packets is also affected by the data transmission rate

H

, and the data transmission rate

H

conforms to a Poisson distribution.

The probability of packet

x

loss under the size of congestion window

S_{c w}

is expressed by the probability density function

P_{x} (S_{c w})

, which is defined as Formula (4).

P_{x} (S_{c w}) = {\begin{matrix} \frac{{(H S_{c w})}^{x} e^{- H S_{c w}}}{x!}, x = 0, 1, 2, \dots \\ 0, o t h e r w i s e \end{matrix}

(4)

3.3. Transition Probability Matrix

The topology of a heterogeneous V2X network can be expressed as an undirected graph

G = (V, E)

where the node set is

V = {v_{1}, v_{2}, \dots, v_{m}}

,

E = {e_{1}, e_{2}, \dots, e_{n}}

is the link set, and

m

and

n

represent the number of nodes and links.

3.3.1. Probability Distribution

Z

represents the Markov transition probability matrix with

N

states, as shown in Formula (5)

Z = [\begin{matrix} v [1, 1] & v [1, 2] & \dots & v [1, N - 1] & v [1, N] \\ v [2, 1] & v [2, 2] & \dots & v [2, N - 1] & v [1, N] \\ ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ v [N - 1, 1] & v [N - 1, 2] & \dots & v [N - 1, N - 1] & v [N - 1, N] \\ v [N, 1] & v [N, 2] & \dots & v [N, N - 1] & v [N, N] \end{matrix}]

(5)

where

v [i, j]

represents the transition probability of the system from state

i

th to state

j

th,

i, j \in {1, 2, \dots, N}

.

S_{c w}

is used to represent the size of the congestion window

c w n d

, and the size of the congestion window

c w n d

in the

i

th state is represented by

S_{c w_{i}}

.

{S_{c w_{1}}, S_{c w_{2}}, \dots, S_{c w_{N}}}

represents the range of congestion window sizes, which represents a system with

N

states. Where,

S_{c w_{1}} = c w n d_{\min}

and

S_{c w_{N}} = c w n d_{\max}

, state

N

can be obtained from Formula (6)

N = c w n d_{\max} - c w n d_{\min} + 1

(6)

where

c w n d_{\max}

represents the maximum size of the allowed congestion window, and

c w n d_{\min}

represents the minimum size of the congestion window.

Assume that there are congestion windows

c w n d_{\max} = 7

and

c w n d_{\min} = 2

in the network of vehicle communications. Therefore, according to Formula (6), the system has only 6 states, and the set of system states is represented as

S_{c w_{i}} \in {2, 3, 4, 5, 6, 7}

,

i \in {1, 2, \dots, N}

. Moreover, the state transition probability

P_{x} (S_{c w})

of the system from state

i

th to state

j

th can be calculated by the probability distribution

v [i, j]

described by Formula (7)

v [i, j] = {\begin{matrix} P_{x} (S_{c w}), i = 1, j = ⌈ λ \times i ⌉ \\ \begin{matrix} P_{x} (S_{c w}), i = [2, \dots, N - 1], j = ⌊ λ \times i ⌋ \\ 1 - P_{x} (S_{c w}), i = [1, \dots, N - 1], j = i + 1 \end{matrix} \\ 1, i = N, j = ⌊ λ \times i ⌋ \\ 0, o t h e r w i s e \end{matrix}

(7)

where,

P_{x} (S_{c w})

represents the probability of the transition from state

i

th to state

j

th, and

i, j \in {1, 2, \dots, N}

. In addition,

λ

is the congestion-sensitive factor, indicating in the control function that congestion may be caused by QoS parameters such as throughput, delay and packet loss in different network service application requirements.

The congestion sensitivity factor

λ

is described in detail in Section 3.3.2. Assume

i, j \in {1, 2, \dots, N}

N = 6

,

λ = 0.8

,

S_{c w_{i}} \in {2, 3, 4, 5, 6, 7}

here, and the state transition diagram of the system is shown in Figure 3.

All transfer probabilities shown in Figure 3 are put into the corresponding relevant positions of transfer matrix

Z

, where zero represents no transfer, and the transfer matrix is obtained as shown in Formula (8). Assume

N = 6

,

λ = 0.8

,

S_{c w_{i}} \in {2, 3, 4, 5, 6, 7}

and

i \in {1, 2, \dots, N}

here.

\begin{array}{l} j 1 2 3 4 5 6 i \\ Z = [\begin{matrix} P_{x} (S_{c w_{1}}) & 1 - P_{x} (S_{c w_{1}}) & 0 & 0 & 0 & 0 \\ P_{x} (S_{c w_{2}}) & 0 & 1 - P_{x} (S_{c w_{2}}) & 0 & 0 & 0 \\ 0 & P_{x} (S_{c w_{3}}) & 0 & 1 - P_{x} (S_{c w_{3}}) & 0 & 0 \\ 0 & 0 & P_{x} (S_{c w_{4}}) & 0 & 1 - P_{x} (S_{c w_{4}}) & 0 \\ 0 & 0 & 0 & P_{x} (S_{c w_{5}}) & 0 & 1 - P_{x} (S_{c w_{5}}) \\ 0 & 0 & 0 & 0 & 1 & 0 \end{matrix}] \begin{array}{l} 1 \\ 2 \\ 3 \\ 4 \\ 5 \\ 6 \end{array} \end{array}

(8)

where

v_{i}

is the vector of the

i

th row in the matrix

Z

, representing the probability of the system transferring from state

i

th to all other

N

states of the system. As shown in Formula (9), each element represents the probability of the system transferring from state

i

th to state

j

th.

v_{i} = [v [i, 1], v [i, 2], \dots, v [i, N]]

(9)

As can be seen from Formula (8), taking

v_{5}

as an example, when the system is transferred from

S_{c w_{6}}

to

S_{c w_{5}}

, the probability of congestion loss due to random loss probability

P_{x} (S_{c w_{5}})

is

(1 - P_{x} (S_{c w_{5}}))

P. Therefore, the total probability is going to be

P_{x} (S_{c w_{5}}) + 1 - P_{x} (S_{c w_{5}}) = 1

. Moreover, in matrix

Z

, the sum of the probabilities of each row vector must always be equal to 1, as shown in Formula (10).

\sum_{j = 1}^{N} v [i, j] = 1, \forall i \in {1, 2, \dots, N}

(10)

Now, assume that

v^{(t)}

is the probability distribution of the system state at time

t

, where

v^{(t)}

is equal to the product of the probability distribution of the system state at the previous time

(t - 1)

and the matrix

Z

, as shown in Formula (11).

v^{(t)} = v^{(t - 1)} \times Z, \forall t \in {1, 2, \dots, \infty}

(11)

where

v^{(0)}

is the initial state and represents the transition probability distribution of the system state at moment zero.

The initial value of

c w n d

is set to

⌈ λ W ⌉

, indicating that the system is in the loss degradation state of the congestion avoidance phase. In addition, the initial state row vector of Markov is expressed as

v^{(0)}

, for which the probability distribution is shown in Formula (12)

v^{(0)} [j] = {\begin{matrix} 1, j = ⌈ λ W ⌉ \\ 0, o t h e r w i s e \end{matrix} \forall j \in {1, 2, \dots, N}

(12)

where

j

represents the values of the columns of

v^{(0)}

.

3.3.2. On-Demand-Driven Congestion Sensitivity Factor Based on QoS

Vehicle communications will generate complex and diverse network services in the process of vehicle movement and communication, including security and entertainment applications. Among them, the safety application is the real-time broadcast of basic safety information and emergency safety information, enabling vehicles to implement road conditions and vehicle monitoring, so as to make emergency response behaviors in a timely manner, including safety early warnings, vehicle guidance, and emergency collision avoidance.

In addition, when the vehicle node moves, it needs to obtain the data of application services such as GPS, speedometer, and acceleration sensors in time, while during communication, it may need to obtain the data of business scenarios such as calls, text messages, and traffic. Therefore, for different types of complex and diverse network services in V2X, the required QoS parameters such as bandwidth, capacity, and RTT are different. Moreover, the operating quality of these network services can be directly reflected by QoS parameters, which is an important basis for judging business service capabilities. In short, the quality of service of network services indirectly reflects the operating conditions such as whether the network is congested.

Therefore, in order to accurately and objectively perform effective congestion control for vehicle communications, the paper defines a congestion sensitivity factor

λ_{O D L}

, which is based on QoS and driven on-demand. The congestion-sensitive factor is a parameter for adjusting and controlling the size of the congestion window under the premise that the QoS indicators meet the service standards and the lowest cost of the normal operation of the network. According to the QoS parameter values of different services, the minimum cost required is calculated first, and then effective congestion control is achieved through the congestion-sensitive factor.

First, we express the service capability evaluation index of network service as cost, which is defined as

U

and has

U = [u_{o} u_{d} u_{l}] (\sum u_{i} = 1, i = o, d, l)

, where

u_{i}

represents the cost caused by selecting and using

i

as an important indicator. Furthermore,

o

,

d

and

l

represent throughput, delay and packet loss rate, respectively.

Usually, network performance can be evaluated by two types of metrics: positive metrics that are positively proportional to performance, and negative metrics that are inversely proportional to performance. Obviously, higher throughput implies better system performance, while higher latency and packet loss implies poorer system performance and may lead to congestion. Therefore, throughput

o

is a positive metric, while the other two,

d

and

l

, are negative metrics. A higher cost indicates a higher spend to maintain the operation of the service.

In summary, the cost function

μ_{i}

is obtained by normalizing the evaluation index domain

U

according to the positive and negative indicators, as shown in Equation (13).

μ_{i} = {\begin{matrix} \frac{u_{i} - u_{i}^{\min}}{u_{i}^{\max} - u_{i}^{\min}}, i i s p o s i t i v e m e t r i c \\ \frac{u_{i}^{\max} - u_{i}}{u_{i}^{\max} - u_{i}^{\min}}, i i s n e g a t i v e m e t r i c \end{matrix}

(13)

For a given

i \in {o, d, l}

,

u_{i}

is the actual measured value of evaluation index

i

,

u_{i}^{\min}

is the non-permissible value of evaluation index

i

,

u_{i}^{\max}

is the satisfactory value of evaluation index

i

, and

μ_{i}

is the value obtained after dimensionless processing of the index.

In order to evaluate the network QoS performance parameters reasonably, it is necessary to remove the dimensional difference between the related parameters first, and then obtain the cost value by normalizing these parameters and different weights. The vector

Ω = [μ_{o} μ_{d} μ_{l}]

is used to represent the indicator result value.

Secondly, the weight vector of the QoS performance evaluation index is defined as

B

, which can be expressed as

B = [α_{o} α_{d} α_{l}] (\sum_{i = o, d, l} α_{i} = 1)

. Under the premise of satisfying the basic criteria of the service index, the closer to the basic criteria, the smaller the weight value. The standard value here can be set empirically.

Finally, whether the network is able to meet the service requirements of the business is the key to measuring the network status. According to this, the congestion factor is set and the congestion sensitivity factor

λ_{O D L}

for evaluating the network performance is defined, which is expressed by Equation (14).

λ_{O D L} = Ω \times B^{T} = [μ_{o} μ_{d} μ_{l}] \times {[α_{o} α_{d} α_{l}]}^{T} = \sum_{i = o, d, l} μ_{i} α_{i}

(14)

Through the above steps, the sensitive factors for evaluating network performance based on QoS on-demand drive are obtained. The QoS parameter values are stored in a matrix and updated periodically, through which the congestion factor is calculated from the matrix to achieve congestion control.

3.4. Design of Action Space

a_{t}

is the control action made on the congestion window at time

t

. The paper defines

a_{t}

as increasing the congestion window size

c w n d

by

n

segment lengths

s^{'}

, as shown in Equation (15).

{c w n d = c w n d + n s}^{'}

(15)

Based on the current QoS parameter information of the V2X network, the growth rate

n

of

c w n d

is determined. When network congestion occurs, set

n \leq 0

, maintain or reduce the congestion window size, reduce the amount of data injected into the network, and reduce the pressure of network congestion. When in a high-bandwidth environment, set n > 1 so that the congestion window size grows at an exponential rate. When in a low-bandwidth environment, set

n = 1

, so that the congestion window size grows linearly.

3.5. Reward Function

According to the analysis in Section 3.3.2, it is clear that throughput

o

is a positive indicator, while the other two,

d

and

l

, are negative indicators. Assuming that the reward obtained by the network state moving from

s_{t}

to the next state

s_{t + 1}

is

r_{t}

, then the reward function is defined as shown in Equation (16).

r_{t} = α_{o} (o / o_{\max}) - α_{d} (d_{\min} / \bar{d}) - α_{l} (\bar{l} / l_{\max})

(16)

where

o

is the current throughput,

o_{\max}

is the historical maximum throughput, and

o / o_{\max}

represents the increased throughput after the execution of action

a_{t}

.

\bar{d}

is the average delay,

d_{\min}

is the historical minimum delay, and

d_{\min} / \bar{d}

represents the reduced delay after the execution of action

a_{t}

.

\bar{l}

is the average packet loss rate,

l_{\max}

is the historical maximum packet loss rate, and

\bar{l} / l_{\max}

indicates the improved packet loss rate after executing action

a_{t}

.

α_{i}

is the weight, reflecting the proportion of the weight of throughput, delay and packet loss to the reward, reflecting whether the optimization objective of the congestion control algorithm prefers throughput, delay or packet loss, and has as

\sum_{i = o, d, l} α_{i} = 1

.

3.6. Policy Function

The paper uses the deep neural network algorithm PPO2 to obtain the approximate optimal policy function

π (a_{t} | s_{t})

. The neural network parameters are denoted as

θ

, and the optimization objective function for reinforcement learning is defined as Equation (17)

D (θ) = E_{π_{θ}} {({\hat{R}}_{t} - Q (s_{t}))}^{2}

(17)

where

{\hat{R}}_{t}

denotes the expected value of the reward,

Q (s_{t})

is the value function of state

s_{t}

, and

D (θ)

uses the least squares method to calculate a biased estimate of the value function

Q (s_{t})

, which represents the cumulative reward predicted to be obtained at the end of the round at state

s_{t}

. Therefore, the least squares method is often used to define the objective function, and the square operation ensures the non-negativity of the objective function.

The optimization objective function

D (θ)

wants to maximize the reward value obtained by executing the corresponding action

a_{t}

by adjusting the parameter

θ

of the policy function. However, if the parameter

θ

is adjusted too much, frequent oscillations during the gradient rise can cause PPO2 to not converge quickly. Therefore, PPO2 introduces

c l i p ()

into the objective function

D (θ)

, which is defined as Equation (18)

D^{c l i p} (θ) = \sum_{(s_{t}, a_{t})} m i n (\frac{π_{θ} (a_{t} | s_{t})}{π_{θ_{k}} (a_{t} | s_{t})} {\hat{A}}_{t}, c l i p (\frac{π_{θ} (a_{t} | s_{t})}{π_{θ_{k}} (a_{t} | s_{t})}, 1 - ε, 1 + ε) {\hat{A}}_{t})

(18)

where

c l i p ()

is the truncation function, defined as Equation (19)

c l i p (\frac{π_{θ} (a_{t} | s_{t})}{π_{θ_{k}} (a_{t} | s_{t})}, 1 - ε, 1 + ε) = {\begin{matrix} \frac{π_{θ} (a_{t} | s_{t})}{π_{θ_{k}} (a_{t} | s_{t})} 1 - ε < \frac{π_{θ} (a_{t} | s_{t})}{π_{θ_{k}} (a_{t} | s_{t})} < 1 + ε \\ 1 - ε \frac{π_{θ} (a_{t} | s_{t})}{π_{θ_{k}} (a_{t} | s_{t})} \leq 1 - ε \\ 1 + ε \frac{π_{θ} (a_{t} | s_{t})}{π_{θ_{k}} (a_{t} | s_{t})} \geq 1 + ε \end{matrix}

(19)

where

r = \frac{π_{θ} (a_{t} | s_{t})}{π_{θ_{k}} (a_{t} | s_{t})}

, and

r

reflects the magnitude of the change in the parameter update; the larger the value, the larger the update magnitude and vice versa. If the result of

D^{c l i p} (θ)

is positive, it means that the reward obtained by performing the current action is higher than the mean value and it is desired to select such actions as much as possible. On the contrary, if the result of

D^{c l i p} (θ)

is negative, it means that the reward for performing the current action is lower than the average value, so the function should try to avoid performing such actions later. With the help of the intercept function

c l i p ()

, the function value of

D^{c l i p} (θ)

is limited to the interval

[1 - ε, 1 + ε]

to avoid excessive fluctuations in the update. The schematic diagram of the

D^{c l i p} (θ)

function is shown in Figure 4.

When the optimization objective function

D > 0

, if

r > 1 + ε

, it is truncated so that it is not too large. Similarly, when

D < 0

, if

r > 1 - ε

, it is also truncated so that it is not too small.

D^{c l i p} (θ)

ensures that

r

does not fluctuate drastically.

3.7. Description of the Algorithm ICCDRL

The paper applies the deep reinforcement learning algorithm PPO2 to vehicle communications congestion control by observing the current network state information, learning from historical experience, optimizing decisions based on evaluative feedback from the network environment, using a large amount of historical QoS data as a training set to optimize the model with the help of congestion-sensitive factors, and enhancing the feedback signal when the action performed by a node is positively rewarded, and thereafter, the probability of performing the action becomes larger. On the contrary, the probability of a node performing the action becomes smaller and the congestion window size of the next moment is selected, thus forming an intelligent congestion control policy. The detailed steps of the ICCDRL algorithm proposed in the paper are described as follows.

Step 1:: Input the initial state $s_{t} = {c w n d, A C K, R T T, t h r o u g h p u t, P_{x}}$ of the network and initialize the parameter $θ$ of the policy function $D (θ)$ ;
Step 2:: Collect the actions $a_{t}$ corresponding to the state $s_{t}$ needed to run the policy function $π (a_{t} | s_{t})$ ;
Step 3:: At state $s_{t}$ , perform action $a_{t}$ to obtain reward $r_{t} = α_{o} (o / o_{\max}) - α_{d} (d_{\min} / \bar{d}) - α_{l} (\bar{l} / l_{\max})$ ;
Step 4:: Obtain the expectations of the reward according to $\hat{R} {= r}_{0} + γ r_{1} + \dots + γ^{t} r_{t} = \sum_{t = 0}^{T} γ^{t} r_{t}$ ;
Step 5:: Obtain the optimization objective function according to $D (θ) = E_{π_{θ}} {({\hat{R}}_{t} - Q (s_{t}))}^{2}$ ;
Step 6:: Obtain the optimal value of the optimized objective function $D (θ)$ with the help of the gradient ascent method;
Step 7:: Execute action $a_{t}$ and update $c w n d$ according to ${c w n d = c w n d + n s}^{'}$ ;
Step 8:: Repeat step 2 through step 7.

4. Simulation Experiments and Result Analysis

Network simulator 3 (ns-3) is an open-source network simulation emulator that draws extensively on the successful technologies and experiences of existing excellent open-source network simulators. In addition, ns-3 has an excellent development environment with rich modules and open source code. Therefore, ns-3 can provide a high-performance network simulation environment that is closer to real networks.

For the paper, ns-3 was installed on a Linux Ubuntu 16.04 system as a simulation experiment environment to build a highly controllable and reusable simulation platform. The paper divides the experiment into two parts: OpenAI Gym and a heterogeneous V2X network model, as shown in Figure 5.

OpenAI Gym is a toolkit for reinforcement learning (RL) research. The interface integrates the RL framework into the network simulator ns-3, and the package is available open-source under the GPL license [37].

First, a heterogeneous V2X network topology is built in the network simulation platform ns-3. Then, the heterogeneous V2X network model is constructed as a Gym, and the ICCDRL algorithm is trained using the NS3-Gym interface, which is a toolkit for DRL-based network protocols. The algorithm ICCDRL proposed in the paper is developed in the OpenAI Gym section. In each control time step, OpenAI Gym passes

c w n d

and

s s t h r e s h

to the V2X network model, which in turn feeds the updated network state to the OpenAI Gym module.

ICCDRL was set with a time step of 0.1 s, and a total of 800,000 steps were trained. When the training reached 80,000 steps, ICCDRL had converged, and the reward value obtained at this time tended to be stable.

Thereafter, the paper designed different simulation experiments to compare the performance of ICCDRL with traditional algorithms such as NewReno, Cubic, and Westwood DRL-based DCC [35] in terms of packet loss rate, throughput, link utilization, delay, and jitter. Furthermore, the convergence, fairness and friendliness tests of ICCDRL are conducted to verify that the intelligent congestion control algorithm based on deep reinforcement learning proposed in the paper has superior performance than NewReno, Cubic and Westwood.

4.1. Simulation Environment

A classical dumbbell-type network is built in the ns-3 platform as the simulation experimental topology shown in Figure 6.

Assuming that R1 and R2 are connected through WAN links, the bottleneck link bandwidth varies randomly according to a uniform distribution, and the end-to-end aggregated data flows of the traversal topology are long life flows and short life flows, adapting to various different applications of heterogeneous vehicular networking. R1 is wired to the server, and R2 is wirelessly or 5G-connected to the vehicle node.

The simulation parameters are set according to the actual measurement data of the Telematics environment to reflect the real network as much as possible. The simulation parameters are shown in Table 1.

SUMO, a micro traffic simulation software, is used to generate traffic flow. Two scenarios are simulated: a static scenario and a high-speed moving scenario. Considering that the transmission range can cover the width of the road, the simulation is simplified to a two-way single-lane scenario with a total length of 1 km, where the nodes are randomly distributed on the road and always move within that range. Moreover, vehicle speed and direction (lane) are also selected randomly, but the range of moving speed is limited to 40–60 km/h, and the ratio of vehicles in both directions is about 1:1. Due to the high packet-sending frequency of the Telematics communication, the relative position of the vehicle changes less during the packet sending time, so the relative position of the vehicle does not have an excessive impact on the sending and receiving of messages. Therefore, the simulation uses a vehicle with a uniform motion model.

The parameters of the PPO2 algorithm used are set as follows: the number of updated rounds is 200, the discount factor

γ

is 0.99, the learning rate

δ

is 0.0025, the greedy coefficient

ε

is 0.3, and the maximum number of steps per training round is 150.

4.2. Analysis of Simulation Experiments

Experiments related to short-term flows that often occur in heterogeneous vehicular networks are conducted to compare the performance of ICCDRL with NewReno, Cubic, Westwood and DRL-based DCC for different BERs and different numbers of data flows. The better performance of ICCDRL is verified by the variation of QoS performance metrics such as congestion window size, link utilization, and throughput. Moreover, several graphs are generated to verify the high performance of the proposed congestion control strategy.

4.2.1. Static Scenario

Figure 7 depicts the real-time throughput and RTT of ICCDRL and its comparison algorithm in static scenarios.

In Figure 7, the DRL-based DCC achieves a relatively high average throughput of about 910 KB/s; 90.6% of the total bandwidth and an average RTT of 130 ms, reflecting a better performance. Although the average throughput of NewReno and DRL-based DCC are close, NewReno has a larger RTT of about 142 ms. NewReno is relatively conservative in the congestion avoidance phase, achieving 88.7% network utilization. Westwood also achieves a higher throughput, but with an RTT of 170 ms, resulting in about 61 packet losses. This is because Westwood drops its

c w n d

only when packets are lost. Cubic’s RTT is only 65 ms, but the throughput is very low, with only 35.4% link utilization. In other words, Cubic sacrifices throughput for low latency. ICCDRL outperforms these algorithms not only in terms of throughput, but also in terms of RTT. It achieves the maximum throughput, reaching 91.9% of the bottleneck bandwidth, with a very low RTT of 76 ms.

4.2.2. High-Speed Mobile Scenario

Thereafter, simulation experiments verify the performance of ICCDRL compared to other different TCP algorithms when the nodes run at 40–60 km/h in a high-speed mobile scenario. Unlike the performance of CC in the static case, each algorithm has a surge in RTT and a drop in throughput. The results show that ICCDRL achieves very low RTT and maximum throughput with stable control, even in the high-speed moving case.

(1): Comparison of Congestion window

In the network simulation parameter environment described in Section 4.1, the transmission of different services such as file, image, video, and voice were completed using ICCDRL with NewReno, Cubic, Westwood and DRL-based DCC, respectively. A comparison of the congestion window was carried out, and the comparison graph is shown in Figure 8.

Figure 8 shows the comparison of real-time

c w n d

in a vehicle communications scenario. Unlike the traditional TCP protocols that increase their

c w n d

in a fixed manner, the advantages of DRL-based DCC and ICCDRL algorithms are very definite. In NewReno, once packet loss is detected, it causes the congestion window CWND to decrease frequently, and when the multiplicative reduction mechanism is activated, NewReno will directly reduce the window value to the minimum. However, DRL-based DCC and ICCDRL do not blindly reduce the size of the congestion window whenever a packet is lost, as NewReno does, and eventually keeps the size of

c w n d

at a high level.

In addition, ICCDRL has a more stable

c w n d

than DRL-based DCC, due to the congestion factor set by ICCDRL. ICCDRL can quickly increase

c w n d

to an estimated target size, set the appropriate new window parameters, and then, at each switching interrupt, fine-tune around that target size so that the congestion window

c w n d

continues to trend upward. In this way, ICCDRL achieves maximum throughput with minimum

R T T

in the case of a short transmission gap between two switches.

(2): Comparison of Bottleneck Link Utilization

Figure 9 depicts the comparison of bottleneck link utilization when using ICCDRL with NewReno, Cubic, Westwood, and DRL-based DCC respectively. Westwood was not able to efficiently utilize the growing bottleneck bandwidth and aggregated very slowly, mainly due to the simultaneous sending of data flows at 0 to 100 s. Compared to Westwood, NewReno, Cubic and DRL-based DCC significantly improve bottleneck link utilization, but bottleneck link utilization is still low compared to ICCDRL.

(3): Comparison of Throughput

The FTP was used to transfer 50–200 MB files in the same simulation environment. Meanwhile, the real-time throughput in the network was collected using Wireshark analysis software to compare the average throughput of ICCDRL with NewReno, Cubic, Westwood and DRL-based DCC at different packet loss rates, and the results of the experiment are shown in Figure 10.

It can be seen from Figure 10 that, after a period of simulation, ICCDRL outperforms the NewReno, Cubic, Westwood and DRL-based DCC algorithms in terms of throughput when the packet loss rate is low. However, the throughput of all four algorithms decreases when the packet loss rate increases, but the average throughput of ICCDRL and DRL-based DCC maintains a high and relatively stable level when transferring files. NewReno does not distinguish packet loss types and blindly reduces

c w n d

, leading to an unnecessary waste of resources and a relatively low average throughput. Cubic uses a cubic function to detect the congestion window, and packet loss and retransmission causes the size of the congestion window to drop abruptly, which inevitably degrades network transmission performance. Westwood uses a conservative type of congestion avoidance with high jitter, resulting in some overall throughput degradation and relatively poor network performance.

In addition, the ICCDRL algorithm introduces a congestion sensitivity factor

λ_{O D L}

, which is used to compare the weights of the QoS parameters and the values of the threshold features to determine whether they meet the service criteria and the minimum overhead for normal network operation. ICCDRL can therefore quickly reach the maximum bandwidth and stabilize at the switching gap point by properly adjusting the size of the congestion window. In this way, frequent sharp drops in the congestion window and resulting throughput reductions can be avoided effectively. The results show that the conventional TCP algorithm cannot adapt to the characteristics of vehicular communication. On the contrary, ICCDRL has better performance with smoother throughput after a certain amount of learning and can fully utilize the network bandwidth for data transmission.

(4): Comparison of RTT

RTT represents the time it takes for a packet to be sent until an acknowledgement is received, and it directly reflects the current state of network latency. The comparison graph in RTT is shown in Figure 11. Generally, the RTT of ICCDRL and DRL-based DCC algorithms increases compared to the other three algorithms due to the fact that ICCDRL and DRL-based DCC algorithms are more aggressive, trying to utilize all available bandwidth and sending too many packets into the link, which causes network congestion and leads to an increase in RTT eventually. In addition, the ICCDRL algorithm has a smaller RTT than the DRL-based DCC algorithm because it introduces a congestion-sensitive factor to differentiate the service data weights. Moreover, the increase of RTT in ICCDRL is not significant compared with the other three algorithms.

From another perspective, this also reflects that the use of intelligent congestion control based on DRL has a significant impact on the transmission performance of vehicle communication.

(5): Comparison of Packet Loss

The packet loss rate during the node sending data is shown in Figure 12. It can be seen from Figure 12 that the packet loss rate of all five algorithms is close to 0.01%, which is consistent with the ns-3 parameter setting; the packet loss rate of ICCDRL is 0.124%, which is slightly higher than the other three traditional algorithms and lower than the DRL-based DCC. This is because ICCDRL sends the most packets, and some packets are dropped because the link node cache is full, which results in packet loss.

(6): Evaluation of Fairness and Friendliness

This section tests ICCDRL’s good fairness and friendliness through simulation experiments. When multiple data flows compete for a bottleneck link, ICCDRL is tested for its ability to allocate bandwidth fairly. In the simulation experiment, 10 long-life FTP flows and 20 short-life FTP flows were created and tested. Each test sends data simultaneously and randomly samples 4 flows and counts the throughput of each flow. The comparison of the fairness of ICCDRL at different wireless BERs is given in Figure 13.

As can be seen from Figure 13, the throughput of each data flow is basically in the range of 0.02–0.04 Mbps, with little difference when using ICCDRL in cases of low and high BER, and each data flow can basically allocate bandwidth fairly, showing that ICCDRL has good fairness. Randomly sampled data flows get different shares of bandwidth, which is due to the ICCDRL algorithm setting congestion sensitivity factors and QoS weights to distinguish the importance of different services, so that the congestion window changes more smoothly and is not too aggressive.

In addition, to verify that ICCDRL does not have a large impact on other TCP protocols, the experiments still use the dumbbell topology described above to create four short-life flows, one of which runs the ICCDRL algorithm, while the other four flows run NewReno, Cubic, Westwood and DRL-based DCC, respectively, and count the average throughput of the four flows to obtain the friendliness of the ICCDRL algorithm. shown in Figure 14.

It is obvious from Figure 14 that, although ICCDRL occupies a higher share of bandwidth than other algorithms, it does not have a large impact on the throughput of other algorithms. This is due to the fact that ICCDRL uses the PPO2 algorithm for learning and sets congestion sensitivity factors and QoS weights to distinguish the importance of different services, which makes the congestion window change more smoothly and not too aggressively, showing that ICCDRL has good friendliness.

(7): Comparison of Convergence Speed

DRL includes many different algorithms such as Deep Q Network (DQN) and Proximal Policy Optimization (PPO2). Among them, DQN uses a neural network to approximate the Q table, and after calculating the value function, the greedy algorithm is used to output Action, which is suitable for a discrete action space, while PPO2 is suitable for a continuous action space. This section compares the convergence speed of the PPO2 algorithm used by ICCDRL with that of DQN, and the experimental results are shown in Figure 15.

From the figure, it can be seen that the reward value of PPO2 stabilizes at 80,000 training steps as the number of training steps increases, indicating that the PPO2 algorithm has basically converged at this time. On the contrary, DQN still repeatedly oscillates drastically and fails to converge after training for up to 350,000 steps. It is verified that PPO2 has a fast convergence speed and the ICCDRL algorithm has convergence.

In addition, ICCDRL utilizes the PPO2 algorithm for training and designs congestion-sensitive factors to distinguish the QoS weights of different services, resulting in a more fine-grained setting of the threshold for entering slow-start and the congestion window size. ICCDRL has low computational complexity and is more versatile in vehicle communications with low latency.

5. Conclusions

In heterogeneous V2X, intelligent congestion control techniques are important for providing efficient network services. The emphasis of this paper is on congestion control. For the dynamic nature of vehicle movement and the diversity of network service types in vehicle communications, the paper proposes an ICCDRL as a solution to address the diverse needs of network applications of terminal devices, so as to ultimately achieve on-demand driven congestion control in V2X networks. First, traditional congestion control is discussed in this paper. A congestion control model based on DRL is proposed using Markov’s stochastic memoryless property to design a reasonable state space, action space and reward function. Second, the main goal of this phase of the study was to obtain the minimum cost as well as to define the overhead weights and congestion sensitivity factors

λ_{T D L}

. The QoS parameters of different services in vehicle communications were collected first, and then the minimum cost was calculated according to the different importance of the services. Finally, the simulation platform was built in ns-3, and the experimental results verify the high performance of the ICCDRL algorithm proposed in the paper.

The key advantage of ICCDRL is to differentiate the importance and congestion sensitivity factors of different services based on QoS weights, using PPO2 to learn from historical experience and using a large amount of historical QoS data as a training set to optimize the model. In addition, because the Agent of the ICCDRL algorithm combines the design idea of on-demand driven service, the congestion window changes more smoothly and is not too aggressive, unlike other DRL-based congestion control algorithms, showing a very good performance. In response to the simulation results, further optimization algorithms will be proposed to improve the slow-start threshold in the future, hoping to achieve further improvement of transmission performance in V2X.

Author Contributions

H.W. conceived and designed the whole system; H.W. and Y.Z. designed the algorithm and model; H.W. and H.L. conducted the experiment and analyzed the data; H.W. wrote the research paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research was financially supported by the Industrial research project of Science and Technology Department of Shaanxi Province (Grant No. 2016KTZDGY4-09), key research and development plan project of Shaanxi Science and Technology Department (Grant No. 2017ZDXM-GY-016), and the project of Innovation and Entrepreneurship Training Program for College Students at the national level (Grant No. S202110702110; S202110702089), and the Research project on teaching reform of education in Shaanxi province (Grant No. 20JGY016, 17JZ004, 17JY015), and the Characteristic disciplines in Education department of Shaanxi province (Grant No. 080901), the Pre-research Project of 13th Five-year Equipment Development (41402020202).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not Applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

V2X	Vehicle to Everything
V2V	Vehicle to Vehicle
V2I	Vehicle to Infrastructure
V2P	Vehicle to Pedestrian
V2N	Vehicle to Network
DRL	Deep Reinforcement Learning
QoS	Quality of Service
DSRC	Dedicated Short Range Communication
C-V2X	Cellular Vehicle to Everything
3GPP	3rd Generation Partnership Project
C-V2X	Cellular Vehicle-to-Everything
CSMA	Carrier Sense Multiple Access
TCP	Transmission Control Protocol
ICCDRL	Intelligent Congestion Control Strategy Based on Deep Reinforcement Learning
VANET	Vehicular Ad-Hoc Network
IoV	Internet of Vehicles
UBRCC	Utility-Based Rate Congestion Control
HSR	Hierarchical State Routing
UAVs	Unmanned Aerial Vehicle
NDNs	Named Data Networking
DRL-CCP	Deep Reinforcement Learning Congestion Control Protocol
BDP	Bandwidth-delay product
DCC	Dial Control Center
SUMO	Simulation of Urban Mobility
FTP	File Transfer Protocol
BER	Bit Error Rate
PPO2	Proximal Policy Optimization
DQN	Deep Q-Network
SDN	Software Defined Network
Hd-TCP	High-Speed TCP
DL-TCP	Deep-Learning-Based TCP

References

Wang, R.; Deng, X.; Xu, Z. Survey on simulation testing and evaluation of Internet of vehicles. Appl. Res. Comput. 2019, 36, 1921–1926+1939. [Google Scholar] [CrossRef]
Zhang, D.; Zhang, M.; Ding, F. Survey of connectivity for 5G-vehicular Ad Hoc networks. J. Nanjing Univ. Posts Telecommun. (Nat. Sci. Ed.) 2018, 38, 27–36. [Google Scholar] [CrossRef]
Kenney, J.B. Dedicated Short-Range Communications (DSRC) Standards in the United States. Proc. IEEE 2011, 99, 1162–1182. [Google Scholar] [CrossRef]
Rahim, N.-A.; Liu, Z.; Lee, H.; Ali, G.; Pesch, D.; Xiao, P. A Survey on Resource Allocation in Vehicular Networks. IEEE Trans. Intell. Transp. Syst. 2020, 23, 701–721. [Google Scholar] [CrossRef]
Yousefi, S.; Mousavi, M.; Fathy, M. Vehicular ad hoc networks (VANETS): Challenges and Perspectives. In Proceedings of the 2006 6th International Conference on ITS Telecommunications, Chengdu, China, 21–23 June 2006; pp. 761–766. [Google Scholar]
Huang, X.; Zhao, D.; Peng, H. Empirical Study of DSRC Performance based on Safety Pilot Model Deployment Data. IEEE Trans. Intell. Transp. Syst. 2017, 18, 2619–2628. [Google Scholar] [CrossRef] [Green Version]
Xiong, K.; Leng, S.; Huang, C.; Yuen, C.; Guan, Y. Intelligent Task Offloading for Heterogeneous V2X Communications. IEEE Trans. Intell. Transp. Syst. 2021, 22, 2226–2238. [Google Scholar] [CrossRef]
Shen, X.; Li, J.; Chen, L.; Chen, J.; He, S. Heterogeneous LTE/DSRC Approach to Support Real-time Vehicular Communications. In Proceedings of the 2018 10th International Conference on Advanced Infocomm Technology (ICAIT), Stockholm, Sweden, 12–15 August 2018; pp. 122–127. [Google Scholar]
Mir, Z.H.; Toutouh, J.; Filali, F.; Ko, Y.B. Enabling DSRC and C-V2X Integrated Hybrid Vehicular Networks: Architecture and Pro-tocol. IEEE Access 2020, 8, 180909–180927. [Google Scholar] [CrossRef]
Huang, W.; Ding, L.; Meng, D.; Wang, J.H.; Xu, Y.; Zhang, W. QoE-Based Resource Allocation for Heterogeneous Multi-Radio Communication in Software-Defined Vehicle Networks. IEEE Access 2018, 6, 3387–3399. [Google Scholar] [CrossRef]
Zhang, H.; Li, T.; Li, D. Research on Vehicle Behavior Analysis Based Technologies for Intelligent Vehicular Networks. J. Electron. Inf. Technol. 2020, 42, 36–49. [Google Scholar]
Andrews, J.G.; Buzzi, S.; Choi, W.; Hanly, S.; Lozano, A.; Soong, A.C.K.; Zhang, J.C. What Will 5G Be? IEEE J. Sel. Areas Commun. 2014, 32, 1065–1082. [Google Scholar] [CrossRef]
Qiu, T.; Chen, N.; Li, K.; Qiao, D.; Fu, Z. Heterogeneous ad hoc networks: Architectures, advances and challenges. Ad Hoc Netw. 2017, 55, 143–152. [Google Scholar] [CrossRef]
Li, R.; Zhao, Z.; Zhou, X.; Ding, G.; Chen, Y.; Wang, Z.; Zhang, H. Intelligent 5G: When Cellular Networks Meet Artificial Intelligence. IEEE Wirel. Commun. 2017, 24, 175–183. [Google Scholar] [CrossRef]
Duan, X.Y.; Liu, Y.N.; Wang, X.B. SDN Enabled 5G-VANET: Adaptive Vehicle Clustering and Beamformed Transmission for Aggregated Traffic. IEEE Commun. Mag. 2017, 55, 120–127. [Google Scholar] [CrossRef]
Roshdi, M.; Bhadauria, S.; Hassan, K.; Fischer, G. Deep Reinforcement Learning based Congestion Control for V2X Communication. In Proceedings of the 2021 IEEE 32nd Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), Virtual, 13–16 September 2021; pp. 1–6. [Google Scholar] [CrossRef]
Choi, J.-Y.; Jo, H.-S.; Mun, C.; Yook, J.-G. Deep Reinforcement Learning-Based Distributed Congestion Control in Cellular V2X Networks. IEEE Wirel. Commun. Lett. 2021, 10, 2582–2586. [Google Scholar] [CrossRef]
Yoon, Y.; Kim, H. Balancing Power and Rate Control for Improved Congestion Control in Cellular V2X Communication Environments. IEEE Access 2020, 8, 105071–105081. [Google Scholar] [CrossRef]
Floyd, S.; Henderson, T. The NewReno Modification to TCP’s Fast Recovery Algorithm: RFC2582[A/OL]. Available online: https://dl.acm.org/doi/pdf/10.17487/RFC2582 (accessed on 1 March 2022).
Ha, S.; Rhee, I.; Xu, L.S. Cubic: A new TCP-friendly high-speed TCP variant. ACM SIGOPS Oper. Syst. Rev. 2008, 42, 64–74. [Google Scholar] [CrossRef]
Brakmo, L.S.; O’malley, S.W.; Peterson, L.L. TCP Vegas: New techniques for congestion detection and avoidance. In Proceedings of the Conference on Communication Architectures, Protocols and Application, New York, NY, USA, 31 August–2 September 1994; ACM: New York, NY, USA, 1994; pp. 24–35. [Google Scholar]
Mascolo, S.; Casetti, C.; Gerla, M. TCP westwood: Bandwidth estimation for enhanced transport over wireless links. In Proceedings of the 7th Annual International Conference on Mobile Computing and Networking (MOBICOM), Rome, Italy, 16–21 July 2001; ACM: New York, NY, USA, 2001; pp. 287–297. [Google Scholar]
Liu, B.; Hu, Z.; Wang, H. Distributed Control Strategy for Vehicular Networking Channel Congestion. J. Harbin Univ. Sci. Technol. 2020, 25, 12–18. [Google Scholar] [CrossRef]
Sun, Y.; Xing, A. Congestion control strategy based on tabu search algorithm for vehicle ad hoc network MAC layer. J. Xi’an Univ. Posts Telecommun. 2017, 22, 15–21. [Google Scholar] [CrossRef]
Tan, G.; Han, G.; Zhang, F. Distributed congestion control strategy using network utility maximization theory in VANET. J. Commun. 2019, 40, 82–91. [Google Scholar]
Stoma, M.; Dudziak, A. The future of autonomous vehic1es in the opinion of automotive market users. Energies 2021, 14, 4777. [Google Scholar] [CrossRef]
Csiszár, C.; Fldes, D. System Model for Autonomous Road Freight Transportation. Promet (Zagreb) 2018, 30, 93–103. [Google Scholar] [CrossRef] [Green Version]
Martinez, A.; Caibano, E.; Romo, J. Analysis of Low Cost Communication Technologies for V2I Applications. Appl. Sci. 2020, 10, 1249. [Google Scholar] [CrossRef] [Green Version]
Ye, H.; Li, G.Y.; Juang, B.H.F. Deep reinforcement learning based resource allocation for V2V communications. IEEE Trans. Veh. Technol. 2019, 68, 3163–3173. [Google Scholar] [CrossRef] [Green Version]
Zhang, X.; Peng, M.; Yan, S. Deep reinforcement learning based mode selection and resource allocation for cellular V2X communi-cations. IEEE Internet Things J. 2019, 23, 2372–2385. [Google Scholar]
Cui, L.; Yuan, Z.; Ming, Z. Improving the Congestion Control Performance for Mobile Networks in High-Speed Railway via Deep Reinforcement Learning. IEEE Trans. Veh. Technol. 2020, 69, 5864–5875. [Google Scholar] [CrossRef]
Zhao, N.; Cheng, Y.; Liu, Z. Deep Reinforcement Learning-Based Channel Intelligent Access Method and NS3 Simulation. Comput. Simul. 2021, 38, 292–296. [Google Scholar]
Xie, R.; Jia, X.; Wu, K. Adaptive Online Decision Method for Initial Congestion Window in 5G Mobile Edge Computing Using Deep Reinforcement Learning. IEEE J. Sel. Areas Commun. 2020, 38, 389–403. [Google Scholar] [CrossRef]
Na, W.; Bae, B.; Cho, S. DL-TCP: Deep Learning-Based Transmission Control Protocol for Disaster 5G mmWave Networks. IEEE Access 2019, 7, 145134–145144. [Google Scholar] [CrossRef]
Lan, D.; Tan, X.; Lv, J. A Deep Reinforcement Learning Based Congestion Control Mechanism for NDN. In Proceedings of the ICC 2019—2019 IEEE International Conference on Communications (ICC), Shanghai, China, 20–24 May 2019. [Google Scholar]
Xiao, K.; Mao, S.; Tugnait, J.K. TCP-Drinc: Smart Congestion Control Based on Deep Reinforcement Learning. IEEE Access 2019, 7, 11892–11904. [Google Scholar] [CrossRef]
Gawłowicz, P.; Zubow, A. ns3-gym: Extending openai gym for networking research. arXiv 2018, arXiv:1810.03943. [Google Scholar]

Figure 1. Network Architecture of Heterogeneous V2X.

Figure 2. The Overall Framework of The ICCDRL Algorithm.

Figure 3. Example of Markov’s State Transition Diagram.

Figure 4. Diagram of Clip Function with different region (a) D > 0 and (b) D < 0.

Figure 5. Implementation Process of The Experiment.

Figure 6. Dumbbell Network Topology Diagram.

Figure 7. Throughput and RTT performances in the static scene.

Figure 8. Comparison of Congestion Window.

Figure 9. Comparison of bottleneck link utilization.

Figure 10. Comparison of throughput.

Figure 11. Comparison of RTT.

Figure 12. Comparison of Packet Loss.

Figure 13. Fairness Comparison of ICCDRL at different BER (a) BER is 5% and (b) BER is 20%.

Figure 14. Evaluation of Friendliness.

Figure 15. Comparison of convergence speed.

Table 1. The simulation parameters.

Parameter	Value	Parameter	Value
Scene size	0.4 km²	Modulation Technology	OFDM
Scene Type	Two-way single lane	Packet Size(packet)	50–100 MB
Number of vehicles	0–150	Number of data flows	5–20
Movement speed of nodes	40–60 km/h	One-way time delay	60 ms
Channel Type	Wireless Channels	Wireless Random Error	0.0001
Frequency	5.9 GHz	Simulation time	800 s
Bottleneck Bandwidth	200 Mbps	Data transfer rate	60 Mbps

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, H.; Li, H.; Zhao, Y. An Intelligent Congestion Control Strategy in Heterogeneous V2X Based on Deep Reinforcement Learning. Symmetry 2022, 14, 947. https://doi.org/10.3390/sym14050947

AMA Style

Wang H, Li H, Zhao Y. An Intelligent Congestion Control Strategy in Heterogeneous V2X Based on Deep Reinforcement Learning. Symmetry. 2022; 14(5):947. https://doi.org/10.3390/sym14050947

Chicago/Turabian Style

Wang, Hui, Haoyu Li, and Yuan Zhao. 2022. "An Intelligent Congestion Control Strategy in Heterogeneous V2X Based on Deep Reinforcement Learning" Symmetry 14, no. 5: 947. https://doi.org/10.3390/sym14050947

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Intelligent Congestion Control Strategy in Heterogeneous V2X Based on Deep Reinforcement Learning

Abstract

1. Introduction

2. Research Background and Related Works

3. Intelligent Congestion Control Model Based on DRL in Heterogeneous V2X

3.1. Basic Model

3.2. Design of State Space

3.2.1. Size of The Congestion Window

3.2.2. Number of ACK Packets Fed Back

3.2.3. Round-Trip Time R T T

3.2.4. Throughput

3.2.5. Packet Loss Rate

3.3. Transition Probability Matrix

3.3.1. Probability Distribution

3.3.2. On-Demand-Driven Congestion Sensitivity Factor Based on QoS

3.4. Design of Action Space

3.5. Reward Function

3.6. Policy Function

3.7. Description of the Algorithm ICCDRL

4. Simulation Experiments and Result Analysis

4.1. Simulation Environment

4.2. Analysis of Simulation Experiments

4.2.1. Static Scenario

4.2.2. High-Speed Mobile Scenario

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.2.3. Round-Trip Time $R T T$