1. Introduction
Though 5G is yet to be deployed widely, it appears that 5G stands to affect nearly every part of our day to day life, be it health care, education, transportation, industry, smart grids, entertainment and media, etc. 5G is expected to power the future of people’s mobility through Internet of Vehicles (IoV) technology. It can be called an Internet on wheels and can allow vehicles to communicate with their drivers, with other vehicles, with traffic signals and roadside infrastructure, or with any other internet-connected item. Features like the ability to stream full video games in vehicles, crash prevention, traffic flow monitoring, safe navigation, intelligent vehicle control, vehicle autonomy, and even electronic toll collection makes IoV one of the attractive applications of 5G.
The concept of connected vehicles is known as vehicular communications or V2X. V2X includes vehicle-to-infrastructure/network (V2I/N) and vehicle-to-vehicle (V2V) [
1,
2]. IoV is supported by Dedicated Short-Range Communication (DSRC) and cellular mobile communication systems for effective implementation. DSRC standards in the United States and ITS-G5 standards produced by the European Telecommunications Standards Institute (ETSI) have both been created during the last decade to coordinate with the activities of many stakeholders in vehicular communications. The ITS and smart cities protocols are grounded in IEEE 802.11p technology, which offers the framework for vehicle ad hoc network communications. The 3rd Generation Partnership Project (3GPP) has recently been working on integrating V2X services into LTE and future 5G cellular networks [
3,
4,
5].
The combination of DSRC and cellular communication is capable of improving intelligence and independent driving capability in IoV, by providing safe, intelligent, comfortable, and efficient comprehensive services. With the rapid pace of 5G development, it has been possible to meet different performance criteria’s in varied application scenarios [
6]. The intelligent transportation system (ITS) is expected to see a boom in the upcoming days due to the unimaginable speeds that can be attained by 5G. Some of the communication requirements of ITS are the low communication delay and high reliability of vehicle status data [
7]. To provide a seamless mobility experience to the user, the handoff strategy, similar to cellular communication, is used in IoV. The handoffs may be frequent due to the small coverage area of RSUs and the dynamic nature of the vehicles. V2X has become a key enabler for bringing an innovative level of connectivity to automobiles, especially when combined with onboard computing and sensor technologies.
Since Internet of Vehicles is characterized by a high level of mobility and dynamic changes in topology, handoff is one of the key technologies enabling efficient deployment of these connected and autonomous vehicles for providing seamless communication. The term handoff refers to transferring the active communication from one Road Side Unit to another seamlessly. This could be horizontal or vertical. It is termed horizontal handoff if network access technologies across the RSUs are the same and vertical handoff if access technologies across the RSUs are different.
With self-driving vehicles changing the transportation scenario, connected vehicles are becoming the core of transportation systems. This calls for redefining business models, be it transportation sector, energy sector, and even government regulations. Under such a connected scenario, security in IoV becomes one of the most important entities, since any system failure can directly impact a user’s safety. Hence, secure and consistent authentication and low computation overhead are required for the amalgamation of 5G networks and vehicular network technology.
In IoV, communication is between the RSU and vehicles [
8,
9]. The information is transmitted over an unsecure wireless channel between two communication parties that are highly vulnerable to attacks. Having an efficient authentication scheme ensures that only authorized users are allowed into the network, and it is effective against active and passive attacks, hence satisfying the need for a secure design. In addition, it ensures that communication is amongst trustworthy entities only. For secure communication, mutual authentication among the involved entities is performed by broadcasting periodic safety messages. These messages include critical information about vehicle speed and location, traffic conditions, and braking status. Hence, it becomes significant to guarantee that an acquired safety message comes from legitimate vehicles and is not altered via attackers, as any modification and replaying of the broadcasted messages can be disastrous to drivers [
10].
Every vehicle in IoV is viewed as an intelligent object with control units, computing facilities, sensing platforms, and storage that are accessible via V2X. In IoV, communication is over a wireless link that is supported by the 5G technology. Nonetheless, due to densification and with limited resources, it is difficult to schedule the resource to collect and process real-time requests from the vehicles, making it difficult to guarantee efficient and reliable data transmission by traditional IoV communications. Through resource sharing, it will be possible to increase the execution speed of a computing task and overcome the insufficient computing resource problem for the vehicle, thus providing ultra-low latency, high bandwidth, higher responsiveness, and throughput to the users [
11,
12]. In reality, the network status and available resources of RSU vary dynamically due to the mobility of vehicles. This results in frequent handoffs between the vehicle and the RSU. To coordinate with V2I links, an effective resource allocation method is required. In traditional methods, resource allocation may be stated as optimization problem using global network information, where QoS requirements of V2I act as a constraint [
13,
14].
IoV has turned into a new, hotly contested arena for innovations in automobile industries. It calls for the development of applications such as road safety, infotainment, and efficient traffic management. With the foremost use of a vehicle still being driving, it is the automakers who will be responsible for putting IoV technologies in the vehicle [
15,
16,
17]. Artificial Intelligence (AI) can explore and handle the unpredictable requirements of IoV to achieve this goal. AI with Machine Learning (ML) and Deep Learning (DL) technology can assist 5G networks in anticipating and managing variable network traffic. Reinforcement Learning (RL), being a type of ML, can effectively solve decision-making problems [
18].
In this work, RL based on the Markov Decision Policy (RL-MDP) has been used for making decisions in selecting the best RSU in a reasonable time. However, this is not preferable for huge data sets since it might not lead to making the right decision. Once RSU is selected, the authentication of vehicles is done using the Deep Sparse Stacked Autoencoder Network (DS2AN). This technique has resulted in less computation complexity and communication cost, and resource allocation for the vehicle is done, using Deep Reinforcement Learning (DRL), during handoff. The contributions of this study are as follows:
To implement a method for reducing handoff delay and to improve QoS parameter performance.
To develop a secure and fast authentication method using DS2AN during the change of RSU.
To implement the DQL method to analyze the node activity and resource for IoV.
To develop the Bellman-Ford algorithm to search the shortest path between communication resources.
The paper is structured as follows: Related works are discussed in
Section 2.
Section 3 describes the methodology adopted in this study, with mathematical modeling of the RL-MDP for the selection of RSU, DS2AN for authentication, and DRL based resource allocation during handoff. A discussion on the results obtained is presented in
Section 4, followed by the conclusion in
Section 5.
2. Related Work
Awan et al. [
19] have proposed a RSU selection method using a dynamic edge-backup node concept in IoV communication. The proposed method is based on clustering. During the discovery of a cluster head, the vehicle sends a message to its neighbors; if no response is received, it initiates a group formation process. A cluster is formed among the vehicles that are moving in the same direction; they are grouped into one cluster. Messages, which contain the location, ID, and speed, are transmitted to the peers in the group. During cluster formation, two nodes, the head node and edge backup node, are selected depending on a score. The calculation of score is dependent on parameters like storage capacity, communication range, and energy. When a new node enters the cluster, a new score will be calculated to decide on the new cluster head. The edge node provides support to the cluster head upon failure of a head node. The cluster head decides on the RSU, with which a connection is established thus enabling communication of the peers. This clustering technique has resulted in higher reliability due to the use of an edge node and an improved throughput. However, there has been no significant improvement achieved in the packet loss rate due to the overhead incurred by the cluster head. Still, seamless handoff of the cluster head is achieved due to the edge node concept used by the author. The authors in [
20,
21] have discussed Energy-Efficient system modeling for Ad-Hoc Networks and in [
22] multipath routing protocols for MANET.
Hussain et al., in [
23], have proposed a new method for network selection called the Fuzzy Convolution Neural Network. The handoff decision, based on performance metrics like vehicle speed and signal strength, is made by utilizing the Shannon entropy-based Q-learning algorithm. Metrics such as data type, spacing, vehicular density, number of obstructions, and signal strength have been used for best network selection. V2V chain routing has been achieved through the Jellyfish optimization algorithm in order to find an optimal route amongst the available routes. The authors have been able to achieve an improved throughput by over 15–20%, a minimized delay, and packet loss. The Fuzzy Convolution Neural Network has helped speed up the network selection process.
The work proposed by Fang Jia et al. [
24] emphasizes the BUS-aided selection of RSUs, built upon software-defined networking (SDN) and evolutionary games. The authors have concentrated on selecting the best RSU in overlapping areas of RSU. An SDN controller is able to communicate with the vehicles, and fixed and mobile RSUs (BUSES). It gathers data related to the load, throughput, location ID, and bandwidth availability of RSUs, and the ID, route, location, speed, throughput, load, and bandwidth of the BUSES. Then, using the evolutionary game theory concept, an RSU which provides best connectivity is selected. The authors have been able to achieve load balance along with an improved throughput.
Due to the dynamic nature of the network, vehicles exchange information either with an RSU or other moving vehicles frequently. After selecting a suitable RSU for handoff, the vehicle authenticates itself to the RSU and in turn checks the RSU’s authenticity. Once mutually authenticated, resource allocation by RSU to the vehicle is done successfully. In [
25], the authors proposed a secure and efficient authentication protocol using cryptographic analysis. The authors have successfully addressed various attacks like the impersonation, man in the middle, smart card theft, session key disclosure, and replay attacks. The algorithm has resulted in enhanced security and has been able to preserve a low communication cost of 138 bytes and a computation cost of 2.262 ms, as compared to related schemes.
In [
26], a mutual authentication method based on the identity of the RSU and vehicle has been proposed. The system related information is stored in the RSU using the bilinear pair mapping theory and elliptic curve encryption algorithm. The use of the bilinear pair mapping theory and elliptic curve encryption algorithm has guaranteed the irreversibility of group operation, making it impossible for attackers to have access to the network through reverse engineering. The legitimacy of the communicating nodes, RSU, and OBU (On Board Unit) is ensured through mutual authentication using IDs, shared keys, and the handshake principle.
Ping Li et al. [
27] formulated a problem based on resource allocation to optimize the throughput of vehicular user equipment (VUEs), while balancing vehicular communications reliability and latency and Quality of Service (QoS) for Wi-Fi networks using networks that coexist with VUE and Wi-Fi User Equipment (WUEs). Authors have employed the listen-before-talk (LBT) method, which requires VUEs to regularly check for other occupants in the channel before transmitting. They estimated the ideal number of offload vehicle users and used the Lagrange Dual Method to convert the optimization issue into a convex optimization problem. Experimentation has proved that their approach performs better when compared to the Greedy method, in terms of throughput.
Pressas et al. [
28] investigated the broadcast transmission in V2V using the IEEE 802.11p standard for DRSCs for a contention-based MAC protocol. With a higher packet delivery ratio and lower latency than 4G, the IEEE 802.11P protocol can provide superior performance. The authors of this work have provided a study to handle scalability issues keeping in mind the need for an ML-based approach. The authors have been able to demonstrate an effective data packet exchange, discover the best contention window for broadcast in V2V communication, along with an increased packet delivery ratio and throughput. In comparison with central TBSs, the RSUs are small and have limited resources. In reality, the network status and RSUs’ resources change regularly during mobility of the vehicles. As a result, a time-varying resource allocation method that takes into account the task demands and dynamic status of vehicles is required. A novel method for network selection using the RL-MDP, a fast and secure authentication method using DS2AN and resource allocation using DQL has been proposed.
In Wireless Sensor Networks (WSN), nodes communicate with one another by using wireless techniques and through certain routing algorithms. In [
29,
30], the authors proposed an Energy Efficient Routing and Reliable Data Transmission Protocol for wireless sensor nodes. In WSN, the size of the nodes are very small, hence the energy efficient mechanisms play a vital role. In this context, the Energy Efficient Routing (HEESR) protocol is implemented based on cluster head selection. Information like distance to the base station and the residual node energy is used to decide on the cluster head. This reduces the re-clustering, which subsequently leads to reduced energy usage.
In [
31], the idea of device monitoring is proposed for the Internet of Things (IoT) and machine-type communication (MTC). A theoretical model for device monitoring through data analytics has been proposed. Using the concept of machine learning, the C50 model was trained and tested for data traffic collected from a cellular network. Performance parameters like latency, packet loss, and throughput were considered as indicators in the experiment. In [
32], the authors have addressed management of big data.
In [
33,
34,
35], the authors have addressed DSDV and OLSR, DA-AODV protocol and AODV Protocol routing protocols respectively.
The following points make it clear that the developed methods are better, compared to the prevailing approaches discussed in the literature.
Some of the algorithms [
25,
26] have resulted in a greater number of message exchanges between the UE and network, leading to an increase in computation cost and communication overhead.
Some algorithms have proposed [
19] the concept of clustering. This increases the communication delay due to an increase in the messages exchanged between the vehicle nodes, cluster head, and RSU, which is time consuming.
Some literatures have considered [
19,
24] either decision delay alone or network selection alone. The proposed work has emphasized both, leading to best network selection and reduced decision delay.
The difficulty in pre-designing an accurate authentication model: Model-based authentication methods are less efficient with respect to time when they are used in a complex, time varying environment.
The challenge in learning time-varying attributes by the algorithm: Static authentication methods are severely affected by the unpredictable variations of attributes like wireless channel parameters.
Points 4 and 5 have been addressed in the work using RL and DQL.
Table 1 summarizes the previous approaches.
In comparison with the existing literature, it is found that RL and DQL techniques are a better option, since they lead to a reduction in computation cost and communication overhead. This advantage is attributed to the fact that the messages exchanged between the vehicle nodes and RSU are reduced by the use of DQL compared with regular cryptographic message exchange protocols, which are found to be time consuming due to the increase in the messages exchanged between the vehicle nodes, cluster head, and RSU.
The work mainly focuses on the handoff decision phase where the vehicle selects the best RSU. The obtained results are compared with TOPSIS and GRA methods. Authentication and security of the vehicles are ensured using the Deep Sparse Stacked Autoencoder Network (DS2AN) algorithm, developed using a deep learning model. Once authenticated, resource allocation by RSU to the vehicle is done through the DRL method, a DQL technique.
3. Proposed Methodology
The work aims to enable a vehicle to select the best RSU with minimum delay, ensuring a fair amount of resource allocation to the vehicle. Bellman-Ford algorithm is used by the RSU to determine the shortest path between itself and the destination, with help of a Q-network. AI can explore and handle the unpredictable requirements of IoV. RL, being a type of ML, can effectively solve decision-making problems. Hence, in this work, the RL-MDP has been used for making decisions in selecting the best RSU in a reasonable time. However, this is not preferable for huge data sets, since it might not lead to making the right decision. Once the RSU is selected, authentication is done using DS2AN and resource allocation for the vehicle is done using DRL during handoff. This technique has resulted in lesser computation complexity and communication cost. In recent years, there has been a surge in applications of DL due to a rise in the volume of accessible data and a decrease in the cost of processing power.
3.1. RSU Selection Based on the RL-MDP
This section discusses the best RSU selection during handoff using the RL technique, based on the Markov Decision process (RL-MDP) [
36]. The MDP is used to model the environment consisting of five components, which are: states (s), actions (a), decision epochs, rewards (r), and transition possibilities (
). The modeling of the environment is being done with the aim of maximizing the expected total reward per connection. Each time agent changes the state when it gets a reward r (s), which is used for calculating the state value function
(s). The reward r (s, a) is obtained for being in a state “s” and taking an action “a” and r (s, a,
) represents the reward for being in a state, taking an action, and ending up in a new state. This is done to obtain an optimal policy for making a right decision in selecting the best RSU.
3.1.1. Design and Implementation
Solving the problem of identifying the best RSU using RL comprises three phases: the environment is formulated into the MDP model, the calculation of reward function, and estimating the total expected reward per connection using Bellman’s equation through the Value Iteration algorithm.
The network is assumed to be heterogeneous involving “M” RSUs, and at any given point of time the vehicle has access to more than one RSU. The following are defined:
State: This includes the RSU identification number, bandwidth available at the RSU, and average delay incurred in connecting to the RSU. Furthermore, it is assumed that the vehicle periodically receives all these advertised information from each RSU amongst other parameters within its receiving range.
Figure 1 illustrates a block schematic of the RL technique, based on the MDP.
State space (S), S = R × × L = (R1, ····, RM, 1, ····, M, 1, ⋯, M).
R = (R1… RM): Indicates available RSUs in the network.
= (1, ⋯, M): Bandwidth presented by each RSU.
= (1… M): Delay presented by each RSU.
Action (a): Vehicle takes actions considering the handoff requirement Ex.: a = (a1, a2). a1 indicates the necessity of handoff and a2 represents to be attached with present RSU.
State transition probability (
): The probability of switching to the subsequent state s′, given the present state “s” and taken action “a”, given by:
where,
s = [h, 1, ⋯, M, 1, …, M]: represents the current state.
s′ = [h′, w′1, ⋯, ′M, ′1… ′M]: denotes next state; m = 1 to M.
[′m, ′m | m, m]: transition probability of m network’s bandwidth and delay.
- ii.
Computation of Reward function (r)
The calculation of reward function is explained below. A vehicle gets a reward r (s, a) immediately when the vehicle is in state “s” and takes an “a” action. The Equations (2) and (3) provide the bandwidth and delay reward function, respectively.
: Total available bandwidth.
, and
indicate minimum and maximum required bandwidth by the connection, respectively.
, di = RSU delay, i = RSUs, i = 1, ⋯M.
and = minimum and maximum delay incurred in connecting to the RSU.
The Equation (3) provides the handoff cost function:
Kh,l: Represents the switching or handoff cost imposed while moving from RSU “h” to RSU “l”.
Consequently, reward function r (s, a) for being in current state s and the chosen action a is given by,
Reward function between two successive decision epochs during vertical handoff.
3.1.2. Bellman’s Optimality Equation
This equation is used to find the state value. The Value Iteration Algorithm (VIA) is used to determine the expected total reward and optimal policy ɠ*(s). The best network is to be selected using the optimality equation, as given below:
The VIA determines the total expected reward and the best policy, consequently this gives the optimal state value function. Being in a current state “s”, the best RSU is selected using the optimal policy ɠ*(s), which is maximum function over all polices.
Through the VIA, the ideal state value is calculated repetitively, improving the estimation of
. This algorithm pseudo-code is provided in
Figure 2, which starts with the initialization of
to arbitrary random values. It repetitively updates the action value function and state values
until it meets an optimal value [
37].
This section concludes that the optimality equation leads to the maximum expected total reward and optimal policy ɠ*(s). The optimal policy ɠ*(s) helps in making the right decision in selecting the best RSU for handoff, given that the current state is s.
3.2. Authentication in IoV Using DS2AN Algorithm
System Model: This work mainly focuses on vehicle to infrastructure communication. The system model (
Figure 3) incorporates the following communication entities: The entire 5G-V2X network is controlled by the 5G core (5GC). The 5G Core network consists of the security anchor function (SEAF), access and mobility management function (AMF), unified data management (UDM), authentication server function (AUSF), and authentication credential repository and processing function (ARPF). The connection and mobility management tasks are handled by the AMF entity, communication is handled by the SEAF, identity verification is performed by the AUSF, authentication data and keys computation are done by the ARPF, and data management is carried out by the UDM. The Trusted Base Station (TBS) is a totally trusted system, which is mainly responsible for verifying the authenticity of RSUs, issuing certificates to the RSUs, generation of public and private keys, and distribution of public keys.
The On-Board Unit (OBU) within the vehicle collects information and transmits to other vehicles and RSUs. An anti- tampering device in the OBU protects the real identity of the vehicle.
3.2.1. Mathematical Model Analysis
Figure 4 depicts the schematic illustration of the DS2AN; it has three layers: encoder, code, and decoder. The encoder takes data from the network and compresses it. This is the code or bottleneck which is transmitted across the network. This code is in turn picked up by the network and decoded to get back the original values. The DS2AN is the expansion of the basic autoencoder, with sparsity applied to the hidden layers.
During authentication, the RSU sends a certificate to the vehicle, which it received from the TBS, and, in response, the vehicle sends a DS2AN code. When a vehicle wants to connect to the RSU, it will scan the network for certain channel parameters, such as the Received Signal Strength Indicator (RSSI), Reference Signal Received Power (RSRP), Channel Quality Indicator (CQI), Reference Signal Received Quality measurement (RSRQ), and Signal to Noise ratio (SNR); these are the inputs to the DS2AN. The dimensionalities of these inputs are reduced to obtain a code; this code in turn is taken up by the network. The DS2AN algorithm, present on the RSU side, decodes this code to obtain the original parameters. Sparsity is applied to the hidden layers to reduce the error in replicating the input, while restricting node information such as channel availability, traffic, and connected device. The authentication of vehicles is considered to be successful if the reconstructed values match the predefined values, thus ensuring security against attacks on the system. Once authenticated, the vehicle communicates with the desired network.
In the hidden layer, the sigmoid activation function is used and the analysis is as follows:
Encoding stage: the input vector
(i = 1, 2,…, p) is converted into the hidden representation h
i by an activation function:
W1 and b1 indicate weight matrix and bias between the input and hidden layer, correspondingly.
The sigm(x) denotes the logistic sigmoid function, which is calculated by
Decoding stage: Reconstructed values
are mapped with the input parameters
, and the activation function used is the softmax function.
W2 and b2 indicate the weight matrix and bias between the hidden and output layer, correspondingly.
The reconstruction error is defined by the likelihood function:
N = number of training samples; = ith training sample.
Through backpropagation, fine tuning of the model is done by updating weights and biases, using following respective equations:
The dataset used for training the DS2AN model is pre-processed, which incorporates the data separation and normalization phases. Data is divided into two sets during data separation, as a training set (75%) and test set (25%), and both datasets are prone to attacks. The value of each parameter is scaled to a range between 0 and 1 in the data normalization phase. Once the model is trained with 75% of the dataset, it will be tested for working with the remaining 25% dataset. If there is any data mismatch, then the nodes will be considered as unauthorized nodes and are stopped from accessing the network. The notations used in the work are provided in
Table 2.
3.2.2. The Authentication Process
The authentication protocol consists of three phases, namely:
1. Initialization phase—during this phase, private and public keys are generated by TBS and the same will be loaded to the RSU and vehicle. The ECDSA is used for key pair is generation. Furthermore, certificates Cj are generated by TBS and issued to all RSUs. When a vehicle is on roaming mode, symmetric keys are generated amongst the HRSU and FRSU using the AES 256 algorithm and are sent to the vehicle by the HRSU for authenticating the FRSU.
2. Registration—The vehicle and RSU register with the TBS to obtain the required system parameters.
RSU registration: Initially, each RSU has to be authenticated and certified by the TBS. In this case, it assumed that the TBS is a trusted party. The RSU sends its identity, RSU_IDj, and location information to the TBS. After authentication, the RSU is provided with a certificate that will be saved in the RSU’s local database and in the 5GCN. After checking the legitimacy of the RSUs, the TBS selects an integer and random number and computes the RSU_SKj (secret key) and RSU_PKj (public key). The TBS assigns the RSU a secret key through a secure channel, the TBS in turn stores {RSU_IDj, RSU_SKj} in its RSU list.
Vehicle registration: During this phase, the vehicle sends its identity, V_IDj, to the TBS. The TBS selects a random number and a calculates pseudonym, P_IDj = H(V_IDj || STBS ||nj). The TBS stores V_IDj in its database, and forwards P_IDj to Vj. The vehicle in turn saves P_IDj into the OBUj. Subsequently, the registration details stored in the TBS are also shared with the 5GCN.
3. Mutual Authentication—All these three phases of the authentication protocol, along with mutual authentication, are explained in detail, below.
The Vj sends a handoff request packet Rq_ID = (asyencr((P_IDj), R_Vj), HRSU_pkj) to the HRSU. Upon receiving Rq_ID, the HRSU forwards it to the 5GCN along with its HRSU_Cj. In the 5GCN, the AUSF checks for the legitimacy of the HRSU through certificate verification. This is then forwarded to the UDM, which in turn, with the help of the SIDF, will provide the vehicle ID by decoding the received message; thus, identity of the vehicle V_IDj is verified. The HRSU will then share the symmetric key of itself and the FRSU to the Vj Resp_ID (KSHRSU, KSFRSU), and forward Vj ID, (P_ID)HRSU to the FRSU. Vj will send the start_packet to the FRSU to initiate a handoff. The FRSU in turn sends a request packet, (Rq_P_ID)FRSU to the Vj. The Vj responds with its identity information packet, Resp_ (P_IDj)Vj, to the FRSU. Once the FRSU receives the Vj identity information, it is compared with P_IDj for verification. Once verified, the FRSU sends a certificate and the message DS2AN_START to the Vj to signal the starting of the DS2AN authentication procedure. On reception of this message, the Vj first verifies the certificate using the symmetric key of the FRSU, which is received from the HRSU. On successful verification, the Vj generates a pre-master key, Rprmkey, and sends DS2AN_AuthCode = (5 parameters, (Rprmkey, P_IDj) FRSU_pkj). The Vj sends DS2AN_AuthCode = (5 parameters, (Rprmkey, P_IDj) FRSU_pkj) to the FRSU. The decrypted (Rprmkey, P_IDj) FRSU_pkj gives the pre-master key and P_IDj. At the FRSU, input is reconstructed and compared with the predefined values, legitimacy of the Vj is decided, and (AUTH_Success) is sent to the Vj. Thus, mutual authentication is successful. Now, the authenticated Vj generates the key Kseaf in the same way as the FRSU. The key, Kseaf, is used by the Vj and RSU for establishing communication.
3.2.3. Security Analysis of the DS2AN
Authentication, being an important aspect of the IoV system, protects against attacks due to malicious nodes entering the system [
38,
39]. Authentication can protect IoVs from internal and external attacks. Confidentiality, integrity, and availability are the three basic security requirements for wireless networks [
40]. Confidentiality is preventing unauthorized nodes from reading the contents of data packets. Availability is permitting the authorized users to view data information. Integrity is avoiding unauthorized modifications for data packets. The attacks selected in this work are the major attacks that hamper these requirements. Addressing these attacks suffices to ensure reliable communication. In [
41,
42,
43], the authors have discussed several attacks that need to be addressed in vehicular networks to assure confidentiality, availability, and integrity of the data. The proposed authentication system has addressed attacks that occur on vehicular networks in addition to attacks addressed in [
44], such as Sybil, DoS, masquerading attacks, message tampering, and anonymity.
Sybil Attack: In this attack, an attacker disseminates multiple messages using fake IDs by producing many false identities to interrupt the normal mode of operations of the IoVs. This attack can be resolved using the DS2AN, as all the participating entities will be registered with the network; only they can receive the parameters sent by the network. Therefore, no attacker can access the network without registering to the network.
Mutual Authentication: A security feature of 5G, mutual authentication helps prevent spoofing of messages. Authentication amongst the Vj, HRSU, and FRSU is robust by the use of the DS2AN, since authentication is performed at both the ends.
Vehicle Identity Protection: The encrypted P IDj that is transmitted to the HRSU ensures the security of user identity.
Passive Attack: The authenticated Vj is identified by the RSU based on a received code. Access to the network by a malicious node is prevented when the reconstructed values do not match. Thus, the DS2AN provides protection against passive attacks.
Node Impersonation Attack: Guessing the valid ID of the registered users in the network is performed by the attacker effectively. This attack has been thwarted by the DS2AN, since the attacker requires the knowledge of the training done at both the Vj and the RSUs.
MitM Attack: The attacker acquires the P_IDj and tries to modify it. However, with a reduction in the dimensionality of the code containing P_IDj, the attacker is prevented from making modifications to the P_IDj.
Masquerading Attack: The attacker pretends to be another vehicle by using the other vehicle’s identity and masking their own identity, resulting in a Masquerading Attack. This attack is thwarted due to the mismatch of the DS2AN_AuthCode.
Message Tampering: One of the common attacks in which exchanged messages of V2V or V2I communication can be altered by the attacker. This can be prevented by training of the Vj and RSUs using the DS2AN.
DoS attack: This attack aims to make valid resources of a system inaccessible. Here, the attacker sends a message containing P IDj to the network, constantly overloading the network with the purpose of stealing sensitive data or bandwidth consumption or to congest the network. This type of an attack has been successfully prevented through the identification of malicious nodes when the reconstructed values of the code do not lie in the range of pre-defined values.
Eavesdropping Attack: This attack is avoided as due to the training model present at the Vj and RSU.
Anonymity: This aspect is important for vehicle confidentiality protection. In this scheme, vehicles will communicate using the pseudonym P IDj = H (V IDj || STBS ||nj). Real identity of the vehicle can be restored by using the secret value STBS.
On successful completion of authentication, resources will be allocated to the vehicle by the FRSU. The next section discusses how the resource allocation is performed.
3.3. Resource Allocation during Handoff Using the DRL
Upon successful authentication, resources are allocated to the vehicles as per their request. Generally, vehicle applications have two components: content-requesting tasks and computation tasks. For example, in the case of autonomous vehicular applications, the vehicle requests the road conditions from the RSU, which subsequently provides the routes to the vehicles as computed results to avoid crashes of vehicles. Meanwhile, vehicles transmit local routes and traffic conditions in real-time to calculate the best driving routes. The TBS is believed to have a considerable amount of computational and storage capacity since the range of its wireless communication is wide enough to reach all vehicles. As a result, the RSUs and TBS can work together to respond to user requests. Vehicles may concurrently receive material and upload the computing tasks from RSUs/TBSs or to them using a full-duplex radio, which is based on sophisticated 5G wireless communication technology.
The AI-based controller for the RSU has been deployed. Furthermore, the controller is made up of environment information gathered from IoVs and an AI-based agent. On the other hand, an AI-based processor takes an action based on existing conditions such as reward and current states, including caching decisions and RSU resource allocation methods. For example, in the case where a vehicle travels to RSU2 from RSU1, the computation offloading and content downloading calculation will shift to RSU2 from RSU1. When the resources available are insufficient at RSU2 for that particular vehicle in nearby RSUs to satisfy the user’s demands, the work will be handed over to the TBS for collaboration.
Figure 5 depicts the block diagram of the DRL model. Here, the Supervised DQL algorithm is proposed for analyzing network activity and vehicle position in IoV and allocate the communication channel or resource.
In this method, the details of vehicle location and vehicle position are stored on the RSU database. The environment is modeled as the MDP with the goal of maximizing the total expected reward per connection. The Bellman-Ford algorithm is used to search the shortest path between vehicles with the help of the Q-network. This shortest path algorithm is used to calculate the overall mobile edge node distance and weight, and to update to RSU. The algorithm avoids frequent changeover between the control of the RSU and vehicle. The RSU node tracks the velocity of the vehicle, its location and direction of travel, and detects all the neighbors of the next forwarding node with the smallest distance and the least hops [
45,
46].
The proposed model explanation is as follows:
Node: Represents a vehicular node.
Position estimation: The position of the vehicle is an essential factor to be determined in the IoV, to know which RSU should do the duties. The duration of time in which the links are created between the RSUs and vehicles is calculated using velocity.
Deep-Q-Learning: After collecting the status from the RSU, the system uses Deep-Q-Learning to understand the above details to build a scheme that directs the vehicles to upload the task to RSU and download content from them. The agent will allocate the resource dynamically for various requirements.
Communication Resource Selection: Communication resource selection means, during the movement of vehicles, the RSU will transmit the data based on the vehicle’s request.
Bellman-Ford algorithm: The RSU will communicate and share the destination information with the vehicle. The Bellman-Ford algorithm is used to share the information using the shortest path found.
Cloud Server: The cloud server is the moving directions and positions of vehicles that will be stored to the cloud server from the RSU. All information regarding the location and vehicle ID will be taken to the cloud server from the RSU. It is like a centralized server that will communicate with each RSU.
3.3.1. DQL Algorithm
DQL estimates the values and is a substitute for the traditional Q-table. In neural networks, input states map to (action, Q-value) pairings instead of mapping to a state-action combination to a q-value. The learning method in DQL employs two neural networks. The design of these networks is the same, but the weights are different. A target network receives the main network’s weights in each of the N steps. The learning process becomes more stable when both of the networks are used and the algorithm learns more successfully. After selecting an action, the agent must carry it out and keep the target as well as main networks up to date by using the Bellman-Ford algorithm (
Figure 6).
Algorithm Steps
The agent perceives the present condition of a network. If the chosen number is random and is equal to or less than epsilon, an agent will act in a random action; otherwise, the DQN will forecast Q-values and take action based on the highest Q value.
The following state, the reward, is kept in the replay memory, along with the terminal variable.
When the agent has enough instances in the memory, the DQN is trained by sampling a certain set of experiences.
The collection of current states is regarded as a parameter, and the calculated values are labelled as “[Target = set_of_reward + gamma × numpy.max(target_net.predict(set of next_state)) × set_of_done]”. The terminal variable has been constructed and thus the terminal state’s value is 0.
Iteratively, the main and target network are updated.
3.3.2. Bellman-Ford Path Analysis
The Bellman-Ford method will always discover the shortest path. Although it’s more time-consuming than Dijkstra’s approach, it is more flexible since it can handle graphs with negative edge weights. In contrast, the Bellman-Ford addresses two major problems with this procedure:
If weight cycles are negative, it will continue to find the shortest path, indefinitely.
Exponential relaxation occurs when the relaxation sequence is incorrect.
Bellman-Ford’s most crucial step is relaxation. This makes the distance from one vertex to another more precise. Relaxation is achieved by comparing the estimated distance between the vertices to other known distances and progressively shortening the distance calculated.
3.3.3. DRL for Resource Allocation
Resource allocation in V2I communications using the DRL method along with the RL framework is presented in this section.
Markov Decision Processes
The MDP formalism may be used to study learning agents mathematically. The vehicle’s status includes the vehicle’s position, velocity, and total size of the content requested by the vehicle (
Figure 7). The action is the allocation of resources to vehicles. The agent accepts various requests from vehicles and assigns resources, and thus vehicles may download content from and upload tasks to various RSUs.
In RL, a time-homogeneous Markov chain is one in which the transition probability is independent of time, t:
At every instance of t, the agent which is the intended RSU link, observes state “s” from the state space, S, and makes an action, at, from the action space, “A”, and selects the appropriate resource based on the policy, π. The Q-function, Q (st, at, θ), determines the decision policy π, where θ is the parameter of the Q-function; it may be acquired using deep learning.
Return and Policy
In RL, the objective is to take actions over time that maximize the expected value of the return (i.e., to select the optimum policy). Return and policy can be defined as follows:
The total discounted reward from time-step t is represented by the return G
t as:
A policy π is a set of actions that the states can take,
It is independent of time. A policy directs the choice of action at a specific state.
Q-learning is used to determine the best optimal policy for allocating resources in V2I communications to maximize the long-term predicted accumulated discounted rewards. Once the Q-values, Q (s
t, a
t), are known, an updated policy, π, may be easily created by performing the action,
(i.e., the long-term accumulated rewards are used for taking an action).
With Q-values, the optimal policy Q* can be found based on the following updated equation; it is possible to find without any knowledge of system dynamics:
Once an optimal policy has been determined through training, it may be used to choose resources for V2I links to ensure link latency restrictions.
At each iteration, the Q-network changes its weights, minimizing the loss function obtained from the same Q-network with old weights on a data set D, as provided by (17),
rt denotes the related reward.
5. Conclusions
With the Internet of Vehicles technology moving towards networking and intelligentization, onboard operating systems, newer automotive electronics, and more in-vehicle communication, newer service platforms providing enhanced security are becoming research hotspots. This work has discussed the: best RSU selection, authentication of vehicles for verifying their legitimacy, resource allocation technique, and finding the shortest path to the destination node for communication during handoff. Handoff, being an important concept for providing seamless connection, decision and authentication needs to be done at a faster rate. This has been demonstrated by the RL-MDP and DS2AN methods. Resource allocation and finding the shortest path are done using the DRL and Bellman- Ford algorithm, respectively.
The proposed DRL relies on DQL to find the optimum resources for different vehicular applications. The dataset has been obtained from a public domain for training the parameters. The RSU tracks the speed, location, and direction of travel, and detects all the neighbors of the next forwarding node with the smallest distance and the fewest hops, so as to reduce the handoff delay. The Bellman-Ford algorithm is used to search the shortest path between the RSUs with the help of the Q-network. A reduction in handoff delay of 13% has been achieved by minimizing the decision and authentication delays. Compared to cryptographic protocols, the DS2AN algorithm has fewer cryptographic key generation and message exchanges between the vehicle and network. The efficiency of the proposed algorithm in thwarting several security attacks has also been established through simulation.
This study demonstrates that the DS2AN is more proficient when it comes to authenticating nodes, with minimum delays. Furthermore, the results demonstrate that the DRL provides a reliable packet delivery of 84% and a throughput of 92 Mbit/s, while upholding tolerable delay levels of 0.1 ms.
The requirement of a large data set for training the model becomes a major limitation of the proposed work. The model is suitable only for infrastructure-based communication. The higher the density of vehicles, the more will be the memory requirement at the RSU.
Though the addressed attacks in this work ensure a secured IoV communication and a robust model, several other attacks like the Wormhole and Route modification attacks can be looked into. As a future work, these attacks can also be addressed in the algorithm along with an emphasis on memory requirement. A further reduction in delay encountered during handoff can be achieved through Mobile Edge Computation (MEC), along with DQL.