1. Introduction
Autonomous vehicles, in this modern era, are a vital part of an advanced transportation system. Autonomous vehicles are considered to be one of the fastest-growing technologies that exist at present. The autonomous vehicle extracts environment perception to conclude directing the agent [
1]. Decision-making is the main module of an autonomous vehicle. Thus, it is vital to make an autonomous vehicle learn finding an optimal path for traversing. This work suggests the integration of Reinforcement Learning method in an autonomous vehicle to make it able to take optimal decisions while traversing in a dynamic environment.
Reinforcement Learning is a kind of machine learning algorithm that works with gaining experiences through communicating with the worldly environment and evaluating feedback to develop a system’s performance to make behavioral decisions [
2]. It improves the system’s performance through trial and error experience with a dynamic environment. Reinforcement Learning provides qualitative and quantitative frameworks through rewards and punishment to understand and adapt decision-making [
3]. The decision-making executes a particular operation through maximizing reward in a specific circumstance [
4]. Incorporated with several machines and software, it looks for the best possible behavior that it should take in a particular condition. Usually, the agent in a Reinforcement Learning model communicates with the environment through perception and action [
5]. It inputs indication from the environment, and the agent takes actions based on decisions which are generated as output. Thus, Reinforcement Learning involves learning to take decisions, mapping situations to actions and maximizing reward signals [
6].
A Reinforcement Learning-based agent decides how a given task will be performed on its own from a training data-set. However, in the absence of a training data-set, an agent has to learn from its experience. For learning to take optimal decisions, the vehicle must explore the same environment many times. A balance of exploitation and exploration is thus needed to get the agent to learn finding better goals [
7]. Exploitation is what the agent already knows about the worldly environment and what it knows of as the best results [
7]. On the other hand, exploration is to discover new conditions and features of the world and finding a better goal path than what the agent knows of already. The typical autonomous vehicle systems are limited within specific mapped models. Including Reinforcement Learning model in autonomous vehicles will make the agent able to operate in a dynamic environment through exploitation and exploration. In consequence, autonomous vehicles will be able to make path finding decisions to traverse in even unknown environments [
8].
In this work, we use Double Deep Q-Learning [
6] as the Reinforcement Learning algorithm for the agent to explore the environment. While the Q-Learning algorithm thrives on finding adequate measures to decide in a provided condition, the Double Q-Learning solves the problem of overestimation of Q-value in basic Q-Learning [
9]. The precision of Q-values depends on the actions that are attempted and the states that are explored. Hence, an agent does not have necessary information regarding which action to take at the beginning of the training. Choosing the maximum Q-value as the best action can give false positive results and can be noisy. Thus, Double Deep Q-Learning Network uses two separate networks which are Deep Q Network and Target Network to dissociate selection of action from target Q-value generation. As a consequence, DDQN significantly helps in reduction of overestimation of Q-values which assists an agent in steady learning through faster training and proves this method to be better than other learning algorithms. Therefore, DDQN proves to be suitable for an autonomous agent to make decisions for optimal traversing by picking the maximum Q-value while exploring the same environment. The process in which the agent consistently determines the maximum Q-value for navigation is called the epsilon greedy strategy [
6].
This work also includes detecting and classifying obstacles along its way while navigating. It takes data from obstacles in rocky, rough and bumpy surfaces through the sensor. We have mainly used a vision sensor to implement the mentioned proposal. The sensory data are fed to the agent, and the decision is taken based on the fed conditions. Faster R-CNN [
10] is primarily applied for the prototype as the vehicle tries to identify and detect objects while navigating. Faster R-CNN, currently, is a distinguished algorithm for object classification. The R-CNN and Fast R-CNN algorithms follow selective search algorithm to detect the region proposals. However, in Faster R-CNN [
11], selective search algorithm is eliminated and the network itself learns the region proposals making this algorithm better than its predecessors. It utilizes convolutional network for region proposal and object detection, making it swifter and suitable in case of real-time object detection.
Several highway decision-making strategies [
12,
13] have been performed with deep Reinforcement Learning; in this research, the deep Q-Learning approach is incorporated with the Faster R-CNN method so that an autonomous agent can also detect and avoid obstacles along its way while traversing. Although the deep Q-Learning and Faster R-CNN algorithms have proven to be successful for autonomous driving strategy and object classification, respectively, the fusion of these two methods for autonomous maneuver combines the benefits of these two approaches in autonomous vehicle navigation.
Figure 1 shows the proposed autonomous vehicle model based on Reinforcement Learning. The proposed model merges the Double Deep Q-Learning Network (DDQN) and Faster R-CNN, and integrates into an autonomous vehicle so that it can make maneuvering decisions while classifying and avoiding objects and obstacles on its way. The proposed model is tested on a gaming environment that is similar to the real-world scenario.
The major contributions of this research work are as follows:
This research presents the development of two learning approaches algorithms, i.e., a combination of Double Deep Q-Learning and Faster R-CNN for an autonomous vehicle in order to identify obstacles and navigate properly. Therefore, it integrates the benefits of these two approaches and ensures autonomous navigation and obstacle avoidance in a stochastic vehicular environment.
Real-world testing of algorithms on autonomous vehicles is time-consuming and expensive; therefore, a dynamic game engine simulator is used for training and validating the proposed model.
This paper is organized as follows.
Section 2 presents the related works.
Section 3 discusses the object classifier methodology through Faster R-CNN. The application of Reinforcement Learning in the autonomous vehicle is described in
Section 4. In
Section 5, the implementation is described in detail.
Section 6 contains the experimental results with analysis.
Section 7 presents the discussion. Finally,
Section 8 concludes the paper.
2. Related Works
A primary proposition of object detection is classifying few interesting regions and use the Convolutional Neural Network (CNN) to it [
14]. Erhan et al. propose the Region-based Convolutional Neural Networks (R-CNN) to minimize the interface by focusing on a single region at a time [
10]. In this work, we use the Faster R-CNN to make object detection more efficient. Other image classifying algorithms, such as R-CNN, apply the selective search for regions; however, Faster R-CNN applies a separate network to predict region proposals. The bounding boxes value is projected by reshaping the proposed region. In [
4], the authors used a region-based CNN using deep learning for road obstacles detection [
15]. However, our proposed model does not only let an agent identify obstacles; it also determines actions based on them by integrating DDQN.
Min et al. present a related work which influences driving policy on highways using Reinforcement Learning [
16]. Their proposed model involves training a Driving Assistance System Supervisor by deep Reinforcement Learning. In our proposed model, we have used a game engine simulator that differs from the driving simulator used in [
16]. The driving simulator used in [
16] is implemented by Unity ML agents for static environment while we have implemented our system in a GTA V dynamic game environment. Real-world testing of algorithms on autonomous vehicles is time-consuming and expensive, so the authors of [
17] present a visually and physically realistic simulator and tested it on a quadrotor autonomous agent. However, as mentioned earlier, we used a game engine as a simulator to test our algorithm. This is because the scenario of that game is almost similar to the real-world. Identical to our proposed approach, Reinforcement Learning has been used as a decision-making method in [
18]. It focuses on combining longitudinal and lateral control in traffic overtaking maneuver. In our work, the maneuvering decisions include lane changing along with acceleration, deceleration and stopping as needed in a dynamic environment.
As human-driven vehicles and autonomous vehicles coexist on land, efficient maneuver of autonomous vehicles has become a necessity. In [
19], a regret theory is adapted based on human drivers’ lane-changing behavior. The predicted decision is integrated, and DDQN is used in training the autonomous vehicle controller. Our proposed model varies as we train the autonomous vehicle to determine path based on the distance calculated in real-time from other vehicles or obstacles. In [
20], the author presents longitudinal control of autonomous land vehicle models by using parameterized Reinforcement Learning. It is mostly implemented using PBAC algorithm and is different from the Double Deep Q-Learning algorithm that we have used in our previous work [
21]. The authors in [
22] depict eight extensions of Reinforcement Learning which consist of adaptive heuristic critic (AHC) learning, Q-Learning and three further extensions to the basic methods to accelerate learning. Our proposed model focuses on Deep Reinforcement-based Learning to train the autonomous agent throughout. In [
23], the authors present a Reinforcement Learning-based approach for autonomous helicopter flight where a dynamic model is created first by the help of a pilot flying the helicopter. The approach, later, integrates with Differential Dynamic Programming (DDP) to learn a controller for optimization of the model. Our application of Reinforcement Learning to land vehicles works in a way that the proposed model takes data from a gaming environment while traversing and learns to take decisions with Double Deep Q-Learning algorithm.
Thus, Reinforcement Learning is considered as an exciting learning method which requires performance feedback from the environment. So far, Reinforcement Learning solved various learning problems. This paper explores Reinforcement Learning as a decision-maker for maneuvering and path-finding in any environment. The proposed work proves to be promising as it not only implements a decision-making method based on Double Deep Q-Learning for autonomous vehicles but also it integrates Faster R-CNN. Ultimately, our proposed model can avoid obstacles while traversing in a dynamic environment. The Double Deep Q-Learning algorithm is used to train the autonomous vehicles for navigation control. The autonomous agent makes path-finding decisions on its own by avoiding obstacles as the distance from obstacles is calculated real-time to take maneuvering decisions which include lane-changing, accelerating, decelerating and stopping.
4. Distributional Agent for Autonomous Driving
4.1. Double Deep Q-Learning Network (DDQN)
H. V. Hasselt came up with the idea of DDQN [
9] as an extension of his past proposition which applies to Deep Q Network (DQN) [
29]. The DQN is one of only a handful few Q-Learning-based methods. Estimation errors cause overestimation issues among these Q-Learning-based algorithms. Overoptimistic fee evaluation and performance decrepitude occur because of overestimation. In any case, the system for DDQN does not merely lessen the overoptimistic revere evaluation, yet also gives preferred execution over DQN on a couple of virtual accustoms. Excerpt and interpretation measures are disengaged by DDQN while it focuses on two Q-functions as a motivation. The objective regard states of DDQN and DQN are:
4.2. Markov Decision Process for Path Circulation
Markov Decision Process expresses the securing way course for self-governing driving in this analysis. The actor determines his activity in each progression, and quickly a prize is earned for that response. The tuple {S, P, A, R, } which has just been articulated previously chronicles Markov Decision Process (MDP). For better understanding, a short synopsis of MDP is expressed below:
s S defines the limited state area which accommodates a gray proportioned picture from vision sensors of the actor.
, where P defines the evolution behavior.
a A is definite response area which works for an actor.
R defines reward behavior, where
characterizes the rebate aspect, where [0,1] for deferred reward.
For the high dimensional perceptions, MDP states
sS can be utilized by adopting Deep Neural Networks [
30].
Figure 3 speaks to the view of the encompassing inclusion by adopting three vision sensors placed in the front.
The autonomous driving actor has five particular activities. The definite response area A comprises of forward, left, right, stop and deceleration. For forward and deceleration, 5 kph is summed or deducted from the running actor acceleration. The actor acceleration is bound in the scope of 30 kph to 80 kph. Actor naturally changes the acceleration for vehicles in a specific separation so that it keeps up a protected gap from the front vehicle. When the vehicle in front out of nowhere slows down or some other vehicle cuts in suddenly before our representative vehicle, the ’stop’ action appears immediately.
4.3. Data Preprocessing
The images of surroundings are collected from three vision sensors, as shown in
Figure 3. The output pictures from vision sensors are edited so the model will not be prepared with the sky and the vehicle’s forward portions. According to NVIDIA model, those pictures are converted to 160 × 320 (3 YUV channels). Those pictures are standardized (picture information isolated by
and deducted 1.0). As expressed in the Model Scheme area, this is to maintain congestion and make gradients work enhanced.
4.4. Model Architecture Pattern
Planning the discernment state, S and taking the accompanying action on response area, A is the primary objective of the actor, . The total of the action will be driven in a stochastic driving situation. Be that as it may, to achieve this planning the model necessities to fulfill to specific conditions: (i) concentrate and catch huge features from three vision sensor’s pictures, and () it should assess the characteristic arbitrariness of the climate for picking a specific activity.
The network should detect spatio-balanced data recovering from vision sensors to fulfill the primary condition. Utilizing CNN directs this cycle. CNN is well-known for extricating spatial aspects in distinction to pictures. Additionally, immense spatial vision sensor pictures are polished into ocular component vector utilizing two-dimensional three convolutional layers.
In addition, the subsequent situation can be satisfied by utilizing the DDQN scheme. Stochastic driving conditions utilize this scheme. For every activity, there is a restoration circulation made by the totaFull connected layer with the assistance of . The Q(s,a) decision can be assessed as the longing of quantiles, .
Furthermore, the most extreme Q-value perhaps recovered from the highest response,
that can be additionally selected from open restricted Q-values of response area,
A.
The flow diagram of recommended DDQN scheme for the proposed algorithm is presented in
Figure 4. We use Keras to prepare this proposed network.
4.5. Hyperparameters
The network is constructed after NVIDIA model. It is used to execute start to finish an autonomous test by NVIDIA. The NVIDIA model itself is all around archived. Supervised image distribution or relapse issues can be fathomed inconsistent strategy utilizing the deep convolutional network. Accordingly, the primary spotlight lies on changing the preparation pictures for conveying the best outcome. In any case, to procure the best outcome, fundamental changes have been prepared for abstaining from over-fitting nature and including impartiality for preparing the forecast precise. Moreover, the accompanying acclimation has been summed to the model.
Lambda layer is acquainted with standardizing input pictures for preparing gradients to work all the more easily and to dodge saturation.
Extra dropout layer is included following the convolution layers for staying away from the over-fitting situation.
At that point ReLU has been executed for actuation capacity to guarantee linearity.
Adam Optimizer at a 1 × 10
learning rate with epsilon
and 32 set of mini-batches is utilized for preparing the network to get superior precision. For instating network loads, Xavier Initializer has been utilized, and all data sources are standardized into [1,
]. To accomplish the precision of the forecast of guiding plot for every picture, mean squared errors have been utilized to assess the loss function.
Table 1 shows the hyper-parameters of this propulsive approach network. Estimation of support Q as 200 has been set. Replay memory’s value is 5,000,000, and
is used as the markdown aspect, which has been locked to
.
-greedy policy has been utilized where
was progressively reduced to
from
in every progression and afterwards locked to
. Those strategies have been actualized during 3 millions of stages training.
4.6. Model Training
Agent training has been done on the game environment and based on the pictures collected from the cameras of the agent. The study assumptions includes the daylight for clear vision and known objects for classification. However, for diversification, we utilized the accompanying growth procedure alongside the Python generator to produce a limitless number of pictures. These pictures have been used arbitrarily as well as changing the characteristics of those images for giving the agent various kinds of scenarios like changing images splendor and shadows, flip images between left and right, etc. Arbitrary detail changes have been given below:
Arbitrarily select right, center or left picture.
Steering angle is adapted by for left picture.
Steering angle is adapted by for right picture.
Arbitrarily flip picture right/left.
Arbitrarily convert picture horizontally with steering angle accommodated ( per pixel shift).
Arbitrarily convert picture vertically.
Arbitrarily added shadows.
Arbitrarily changing picture splendor (lighter or more obscure).
Utilizing the left/right pictures is valuable for preparing the recuperation driving situation. The level interpretation is helpful for troublesome bend taking care of.
6. Results
The integration of Faster R-CNN and DDQN shows the optimum result in autonomous exploration. The data-set that is used here to train the image classifier shows the accuracy of %. This accuracy level represents the effectiveness of the classifier to identify any object which may appear before the autonomous vehicle. Acquiring values from Faster R-CNN, the reward function can be manipulated, which accelerates the efficiency in the decision-making process and enables safe autonomous exploration of vehicle.
The implementation of the proposed model on a game environment results in precise object classification, which assists an agent in taking path-finding decisions avoiding obstacles.
Figure 8,
Figure 9 and
Figure 10 show the output of object detection which is implemented on sample frames from the game environment. Here, identifying nearby cars, objects and pedestrians through Faster R-CNN depicts optimum accuracy and detecting these obstacles makes autonomous navigation more efficient.
Further,
Table 2 calculates the braking distance measurement for the agent in the dry road condition. According to the calculated measures, an agent compares the braking distance with other surrounding cars. Parameter estimator algorithm (PEA) [
31] has been incorporated for calculating the distance between our agent and other cars.
Figure 11 portrays the braking situation in GTA V environment which is the visual representation of the braking distance calculation of the proposed algorithm. When the agent detects any car within the braking distance, it warns the agent. The agent then reduces the speed accordingly and finds a possible path to traverse.
We run some tests on our model to evaluate the constructed object classifier. True negative and false positive values for each class defines the accuracy, and for evaluation, they require a normalized confusion matrix. We construct the matrix, and it is shown in
Figure 12. Here, the rows represent the target values (what the model should have predicted—the ground-truth). The columns represent the predicted values (what the model predicted).
Furthermore, on a scale of
to
, precision and recall for each class are computed gradually according to Bicycle, Bus, Person, Motorcycle, Truck, Van, Car (Seven distinct classes).
Table 3 shows these values. Recall represents the capacity to find the relevant occurrences in a data-set, and precision depicts the fraction of the data points that the model says is relevant that is actually relevant. Overall results show that the image classification model is quite vital for detecting objects that will come in the way of our autonomous vehicle.
Effect of Learning Rate Schedules
Among various learning rate schedule methods, the ReduceLROnPlateau callback method has been demonstrated which drops the learning rate by a factor after the monitored metric remains unchanged for a given number of epochs.
The impact of different ‘patience’ values can be explored here, where ‘patience’ value is the total number of epochs while waiting for a change before dropping the learning rate. Learning rate of 0.01 is used at first and dropped by an order of magnitude by setting the ‘factor’ argument to 0.1. It helps to observe the effect on the learning rate over the training epochs. This can be done by creating a new Keras Callback that is responsible for recording the learning rate at the end of each training epoch. The recorded learning rates can then be retrieved to plot a line graph to notice how the learning rate is affected by drops.
Here,
Figure 13 shows line plots of the learning rate over the training epochs for each of the evaluated patience values. A significant drop in the learning rate within 20 epochs can be observed when the patience value is the smallest. The learning rate only suffers one drop due to the largest patience value of 15.
From these plots, it can be expected that when the patience value is of 5 and 10, the model will result in better performance as a larger learning rate is allowed to be used for some time before the rate is dropped to refine the weights.
Secondly,
Figure 14 indicates the loss on the training data-set for each of the patience values. The plot shows that the patience values of 2 and 5 initiate a rapid convergence of the model. It can possibly be up to a suboptimal loss value. In the event of patience values of 10 and 15, the loss value drops moderately until the learning rate also drops below a certain level where significant changes to loss value can be noticed. This happens halfway when the patience value is 10 and nearly at the end of the run in case of patience 15.
Lastly,
Figure 15 shows the training set accuracy over training epochs for each patience value. It can be observed that, indeed, premature convergence of the model to a less-than-optimal model is resulted from the minor patience values of 2 and 5 epochs at around 65% and less than 75% accuracy, respectively. On the contrary, larger patience values result in better performing models, with the patience value of 10 that shows convergence just before 120 epochs. Patience value of 15 continues to show the outcome of a volatile accuracy given the close to unvaried learning rate.
Therefore, these plots represent how a learning rate that is decreased reasonably for the problem and the chosen model configuration can result in both a skillful and converged stable set of final weights as well as a preferable property in the final model at the end of the training run.
7. Discussion
This system works simultaneously with both Faster R-CNN and DDQN algorithms which makes the exploration process smoother. Implementing only a DDQN model limits the situation handling process of the agent. Therefore, the focus is kept more on the integration of both models to make the exploration process secure and effective without any interruption. Thus, it is important to normalize input data to remove unnecessary data. Further, the system’s accuracy and loss validation data are calculated simultaneously for checking the effectiveness of the algorithm. Comparison among the algorithm that we have used and other similar algorithms also depicts the system’s productivity.
Figure 16 and
Figure 17 are the representation of training data collected from GTA V game environment. We test the system on the agent in possible scenarios that are close to a real-life environment (such as, night time, daytime, foggy, rainy, sunny, crowded etc.). Here,
Figure 16 portrays a graph of raw training data. This data contains everything in the environment that has been captured. However, for obtaining accuracy and better performance, some data are entirely unnecessary. Therefore, raw data have been normalized, which is shown in
Figure 17. This figure represents 160,000 normalized training data. By normalizing the data-set, more accurate results are obtained.
DDQN hyper-parameters determine the system’s performance. Tuning these parameters is thus, important to get better results from the developed algorithm. The following figures depict the characteristics of the DDQN hyper-parameter.
Figure 18 defines the accuracy validation which comes from the accuracy validation function of Double Deep Q-Learning, and
Figure 19 shows the total loss function from the values of the value loss function.
Figure 20 illustrates eight line plots for eight different evaluated learning rates where epoch and accuracy values are represented respectively on the x-axis and y-axis. Classification accuracy of the training data-set is marked in blue, whereas accuracy of the test data-set is marked in orange.
The plots show oscillations in behavior for the too-large learning rates of 1.0 and the inability of the model to learn anything with too-small learning rates of 1 × 10 and 1 × 10. It can be seen that the model is able to learn well with the learning rates 1 × 10, 1 × 10 and 1 × 10, although successively slower as the learning rates were decreased. With the chosen model configuration, a moderate learning rate of 0.1 is suggested which results in satisfactory performance on the train and test sets.
The discount factor can dictate how far-sighted an agent can be. Values that are too insignificant will make an agent consider more about the reward that is present and values that are too big will make an agent pay the same attention to rewards after the time point. This may confuse the agent regarding which action leads to a high or low return.
Figure 21 shows the average score for four different gamma values. It is evident that
= 0.9 makes the agent short-sighted and there is no significant change during 80 epochs. When
, the average score fluctuates widely after the 50th epoch. Since
= 0.99 has a gradually increasing trend, this is used as the final discount factor.
The effectiveness of the proposed algorithms is measured by comparing our implemented algorithm with two other similar Deep Reinforcement Learning algorithms. The compared algorithms are the Deep Q-Learning (Multi Input) and the Deep Q-Learning (Image), which only work on image processing. These algorithms are compared in terms of lane-changing decisions of an autonomous agent in GTA V game environment.
Figure 22 shows the lane-changing results of these algorithms. The figure shows that the proposed algorithm proves to be more favorable than those algorithms. The number of lane changes refers to the compatibility of the algorithm. From the graph in
Figure 22, it can be seen that the DDQN (Multi-Input) makes the lowest number of lane-changing decisions at the end of the training process. It indicates that the decision-making process is more precise in this algorithm. Hence, the proposed approach is best suited and reliable with Double Deep Q-Learning algorithm.
Reinforcement Learning gives reward based on Q-values which generate from Q function. That is why we compare different Q-Learning algorithms in terms of Q-values.
Figure 23 shows the results of the Q-values of different Reinforcement Learning algorithms. The figure illustrates the comparison of the average Q-values of these algorithms, which is one of the characteristics of DDQN hyper-parameters. Q-Learning algorithms make a Q table based on Q-value for taking decisions. Q-Learning algorithms such as DDQN (Multi-Input), DQN (Multi-Input) and DQN (Image) have been considered for this comparison. This figure justifies that DDQN is more appropriate than other Q algorithms for autonomous vehicles. DQN (Image) is a Q-Learning algorithm which is only based on image processing showing the lowest average Q-value. DQN (Multi-Input) shows a better result than DQN (Image); however, its value is not consistent. That is why DQN algorithm for Multi-Input can give inaccurate results at the time of decision-making. On the contrary, DDQN algorithm for Multi-Input represents the maximum average Q-value with a consistency which makes the algorithm more efficient than the rest.
The graph in
Figure 24 is generated from the average reward values (
) of our DDQN hyper-parameter. In this figure, three values (training reward final, evaluation reward final and training reward with first try) have been considered which have been incorporated for the uninterrupted traversing of the agent. The green line shows the training reward values in the first try, which is very low. Better reward values have been obtained in the training process through trial and error, which is shown by the red dotted line. Furthermore, by normalizing the evaluation reward values, the deep red line has been found, which is the final reward values in the training process. The DDQN algorithm works based on the final reward values in the game environment. The final reward value increases at a consistent rate which shows the precision of the proposed algorithm.
8. Conclusions
This work develops a model that integrates the Double Deep Q-Learning (DDQN) algorithm with Faster R-CNN in autonomous vehicles for making decisions to navigate avoiding obstacles on its way. DDQN algorithm has a reward system policy for which a vehicle can take maneuvering decisions in a stochastic environment. Thus, DDQN algorithm ensures more effectiveness over other non-distributional algorithms. Additionally, where most of the existing models for the autonomous vehicle are entirely constructed on the frame of neural networks, our proposed model integrates Faster R-CNN with DDQN to detect any object that appears in front of the autonomous vehicle. Our proposed model, thus, can stimulate accuracy in safe and smooth decision-making even in unexpected situations. Unlike the existing systems, the proposed model is not only enclosed in a mapped environment. Accordingly, our model can avoid the limitations exhibited by the existing systems on exploration scope and Precision. However, the model has potential limitations that include less number of integrated sensors as the system only uses vision sensors to take data from the environment. The system has scope to perform better in stochastic environments if integrated with more sensors. Further, the system faces challenges in decision-making when it has to consider the vehicles which are in parallel sides. Testing the algorithms on a real vehicle requires a large amount of environmental data of a dynamic environment which is expensive and time consuming. A real autonomous vehicle might face challenges to perform globally considering different infrastructural differences of different countries.
Although the proposed model attains preferable results, the key recommendation to work on this system in the future is to integrate more sensors like Lidar and Sonar to make it more efficient regarding decision-making and accurate in path-finding by detecting obstacles. In the future, we hope to overcome the limitation of resources to implement the system practically on a real vehicle. In addition, we expect to increase the accuracy in decision-making during the time of any randomness and to take preferable action complying with reality. Further, the object classifier also has great scopes for improving its performance by lowering the average number of misclassification and by increasing the average number of accuracy levels. By configuring the confusion matrix; it will simply indicate the percentage of accuracy that has been acquired through these processes. Therefore, in the future, we hope to increase the iteration score while training object classes to overcome the misclassification problem.