1. Introduction
Traffic congestion is a worldwide problem because it affects not only a large part of the population, but also the economy, through delays in the delivery of goods and fuel consumption, causing an inability to estimate travel time [
1]. Additionally, the traffic congestion can generate health problems due to the pollution of the gases emitted by cars as well as physical problems of conductors due to the amount of hours spent in the same position inside a car. Thus, maintaining a good vehicular traffic flow seriously impacts people’s quality of life and even safety [
2]. For these reasons, studies [
3,
4] have proposed solutions to reduce congestion in large urban centers, and these solutions propose improvements in urban infrastructure, installing different traffic signs where previously they did not exist.
With the aim to reduce congestion in large cities, relevant research and the emergence of new technologies, such as the evolution of vehicles [
5], has occurred in the last several decades. It is important to note that not only the vehicles but also the cities are changing following this evolution. The concept of smart cities is emerging; the infrastructure of cities is becoming smarter and interconnected [
6]. These cities use various mechanisms of intelligent infrastructure aimed at the well-being of the population.
Nowadays, autonomous vehicles have been developed [
7,
8] by some companies. In the same way, computer vision solutions using Deep Learning (DL) algorithms for detection and tracking traffic lights have been proposed [
9]. Many autonomous vehicles use Artificial Intelligence (AI) algorithms for detecting objects. Algorithms used on these vehicles include Convolutional Neural Networks (R-CNN) [
10], the Faster Region-based Convolutional Network method (Faster R-CNN) [
11], You Only Look Once (YOLO) [
12], and the Single Shot Multibox Detector (SSD) [
13]. They are used for the detection of traffic signs, pedestrians, vehicles, and other objects on the road.
In [
14], the authors proposed a solution to classify pedestrians, bicycles, motorcycles, and vehicles, and several tests were carried out to train an DL algorithm, reaching an accuracy of 89.53%. In [
15], a system was proposed for classifying cars, pedestrians, drivers, and cyclists, achieving a 90% accuracy rate. In both works, DL algorithms for image detection were used. However, the cited works obtained values of accuracy equal to or lower than 90%.
With the advances on autonomous vehicles using these algorithms, the traffic infrastructure in large urban centers has also been modified. Approaches with the use of AI have been used to improve urban traffic; for example, investments on intelligent traffic lights have been made to reduce traffic congestion and traffic accidents [
16]. Some of these traffic lights are considered intelligent because of the use of AI algorithms, the capture of images, or the inclusion of sensors. It is important to highlight that solutions based on sensors have additional costs to be implemented.
For capturing and analyze traffic images, it is necessary to work with almost real-time processing. Studies about intelligent traffic light commonly use, in addition to sensors, images to detect different types of vehicles, such as emergency vehicles [
17,
18]. Due to the relevance of waiting time in traffic to emergency vehicles, commonly, studies propose a traffic light that gives priority to these vehicles through both audible sensors and images. This system, described in [
17], captures images of traffic, identifies a vehicle, and estimates its speed and the distance until it arrives at a traffic light. However, the communications through sensors can fail or can generate false data due to potential problems in the equipment and network. Thus, it is important to have a mechanism independent of a communication network between cars and traffic lights. As previously stated, the current solutions based only on image classification do not achieve reliable accuracy.
Many solutions for real-time image detection explored different DL algorithms, and the solution based on the SSD [
13] and YOLO [
12] architecture models has obtained the best performance results in the recent literature. A detection system of traffic lights was proposed in [
19,
20], using SSD, and the response time obtained satisfactory results with a high accuracy. Similarly, the use of
[
21] had a faster processing speed, detecting objects with a high accuracy, and this has been consolidated in the literature. However, in a traffic lights context, where there are many images that need to be processed in real time, there is a necessity to improve the existing current models, reducing further the processing speed without a negative impact on accuracy.
In this context, an improved version of
called the Priority Vehicle Image Detection Network (PVIDNet) is proposed in the present research. To this end, a lightweight design strategy for the PVIDNet model is implemented through an improved Dense Connection model, based on [
22], using feature concatenation to obtain a high accuracy and using the Soft-Root-Sign (SRS) [
23] activation function for reducing the detection processing speed. In addition, a control algorithm for an intelligent traffic light is proposed. The main goal of this control algorithm is to give priority to emergency vehicles in road intersections controlled by traffic lights. In this work, only ambulances, fire trucks, and police cars are considered as emergency vehicles. Hence, the waiting time of these types of vehicles at traffic lights can be reduced, which is relevant in emergency events.
The proposed solution, composed of PVIDNet and the traffic control algorithm, was evaluated using a simulation tool often cited in related works about urban traffic [
24,
25,
26,
27,
28], the Simulator of Urban MObility (SUMO), in which the vehicular traffic follows the so-called First-in-First-Out (FIFO) principle. The simulation results show that the proposed solution improves traffic control performance, decreasing the waiting time and the total travelling time, especially in emergency vehicles.
The main contributions presented in this paper are summarized as follows:
A priority vehicle image detection network (PVIDNet) is proposed based on an improved YOLOv3 model using feature concatenation, and it presents a better detection accuracy than the original YOLOv3 and other image processing-based methods.
A lightweight model is presented, and the SRS activation function reduces the processing time spent detecting vehicles, maintaining a desirable detection accuracy.
An improved control algorithm for an intelligent traffic light is introduced based on the Brazilian Traffic Code (BTC), and a new proposal regarding the priority of emergency vehicles is considered.
A new Database (DB) that considers five types of vehicles—ambulances, police cars, fire trucks, buses, and regular cars—was created. Each image has three different angles (right, left, and front), and each one has the same image resolution characteristics. To the best of our knowledge, there is no available DB in the current literature that considers all these characteristics. Note that each country has different models of emergency vehicles.
The basis of PVIDNet optimisation is the use of dense blocks, improved transition blocks, and the SRS activation function. Such characteristics of PVIDNet optimize the backbone network, enhancing the feature propagation and improving the network performance. The accuracy of the machine learning algorithms to classify the Fire Truck, Bus, Ambulance, Police, and regular Car images in the testing phase, considering together the right, left, and front images, reached values higher than 0.95.
Additionally, the proposed solution reduces up to 50% of the waiting time for priority vehicles, compared to the FIFO strategy, which is still used in many related works [
29]. This waiting time reduction is very important for emergency events.
The paper is organized as follows: In
Section 2, related works regarding the algorithms used for object detection are presented.
Section 3 describes the methodology used for obtaining the proposed solution, the proposed model, PVIDNet, and the algorithm for an intelligent traffic light. The results achieved and discussions about the proposed model are presented in
Section 4.
Section 5 concludes the paper.
3. Methodology
In this section, the main steps followed in building the proposed intelligent traffic light solution are described. Firstly, the database used in this work is presented. Later, the Deep Learning algorithm in which the Lightweight Priority Vehicle Image Detection Network (PVIDNet) is introduced, and the performance validation metrics used in the tests are also treated. Finally, the proposed control algorithm for an Intelligent Traffic Light is presented, and simulation scenario configurations using the SUMO tool are included.
Figure 1 illustrates how the proposed intelligent traffic light is developed, representing a general flowchart of the methodology, showing the three main steps involved in obtaining the proposed solution. In general, these three steps can be summarized as follows:
The Database. This is a set of homogeneous images used to train the DL model. The set of images is extracted from the following databases: COCO API, VOC, Google images, and IMAGINET.
Development of the Deep Learning Algorithm. This contains the Training and Testing phases to obtain the proposed model, PVIDNet. It is important to note that, in the validation step, a video about urban traffic is used; in this step, identification and classification of each vehicle is performed. For the performance validation assessment of the PVIDNet algorithm, the validation metrics, accuracy, sensitivity, and F-measures are used.
The Proposed Traffic Light Algorithm. After the classification of the vehicles by the DL algorithm, the traffic light controller works according to the vehicle’s priority proposed in this work. For evaluating the improvement of the traffic operation using the proposed traffic light algorithm, a simulation scenario is implemented with the SUMO tool.
The steps are explained in more detail in the following.
3.1. Database
Due to the difficulty of finding a DB with images of different vehicle types with similar image characteristics, a new DB was built to carry out training and tests for the proposed solution. To accomplish this aim, four distinct DBs were used: Image-net [
73], PASCAL VOC [
74], Coco-API [
75], and Google Images.
The DB used in this work is composed of five image classes: ambulances, fire trucks, police cars, buses, and regular cars. Each image class is subdivided into three subcategories: right, left, and front, which represent the angles of the image. In total, this DB is composed of 5250 distinct color images that are subdivided into 350 images to the right, 350 to the left, and 350 images from the front, reaching a total of 1050 images for each class.
The resolution of each image was normalized to 1280 × 720. The following images from the DB in
Figure 2 represent the right subcategory.
Figure 3 represents the left subcategory, and
Figure 4 represents the front subcategory.
After creating the database, it is necessary to select the region of interest of each image that is being trained. The tool used was the Image Labeler from Matlab2019.
Figure 5 illustrates images of the BD in which the Image Labeler was used.
The coordinates of the images used in this work are presented in
Table 1.
3.2. Development of the Proposed Deep Learning Algorithm
As previously stated, the proposed PVIDNet solution used in this work is an improved version of
[
46].
Algorithm 1 represents the steps for performing object detection. Initially, the program is started together with the variables. After performing this process, the created database is loaded. The database is divided into 80% for training and 20% for testing. The training options, such as the number of interactions, times, learning rates, learning factor, speed, and penalty limit, are defined. After executing these processes, the command for training the network is executed. After training and testing of the database, the created network model is tested with urban traffic videos.
Algorithm 1 Algorithm for Object Detection of the Proposed Solution. |
- 1:
Load data from DB; - 2:
Split the DB - 3:
Determine the size of the incoming network - 4:
Change the images to the size of the network entry - 5:
Read the training data - 6:
Select training options - 7:
Train the network - 8:
To evaluate the network with the test data - 9:
To validate the network with urban traffic videos
|
PVIDNet was organized to detect objects at various scales, and it also needs resources of those various scales. Consequently, the last three residual blocks will all be used for further detection.
PVIDNet is implemented using the Tensor-flow framework with Keras, and the detection models are trained and tested using an NVIDIA Titan X server.
Table 2 presents the network initialization parameters used in this work.
In order to adapt the input required for PVIDNet, the input images must be adjusted to 416 × 416 pixels. The batch size used in this work is equal to 8. The adaptive moment estimation is based on [
76], and was used to update the weights of the networks. The parameters, such as the initial learning rate, weight decay regularization, and momentum are the original parameters used in the original YOLOv3. The transfer learning is based on [
77].
3.2.1. Lightweight Priority Vehicle Image Detection Network (PVIDNet)
In this work, a feature concatenation strategy is proposed. The feature maps learned from each image block are concatenated to all subsequent blocks. These blocks are used as inputs through pooling. Thus, the feature maps of all block outputs are concatenated together, in the backbone network, as inputs to the detection module. Through feature reuse and propagation, the input feature maps present an enriched representation power. The map provides additional information for characteristic learning, and this is defined in Equation (
1).
where
and
represent the input and the output feature maps in the backbone network, respectively. The
variable represents the max-pooling operation, and the
variable represents the concatenation operation.
A batch normalization layer and the SRS are used in the network. They are used for dimension reduction and accelerating convergence. The SRS can adjust the output through a pair of independent trainable parameters, presenting a better generalization performance and a faster learning speed.
The SRS activation function is defined in Equation (
2).
where the
and
variables represent a pair of trainable positive parameters. The SRS represents a non-monotonic region in which
provides the zero-mean property. When
, it avoids and rectifies the output distribution. The SRS derivative is defined in Equation (
3).
The SRS is bounded output, presenting the range .
In the experiments, Softmax and RELU were tested for comparison.
The original YOLOv3 model presents several residual blocks, and this fact brings a large number of parameters to the network. Many parameters lead to an extended time training and slow down the detection speed of the model. Thus, the structure of PVIDNet needs to be optimized for real-time working. In PVIDNet, a modification in YOLOv3 was performed, as shown in
Figure 6.
The Dense Block solution [
22] presents some advantages, such as computational and storage efficiency. For this reason, it is used in PVIDNet. DenseNet needs only half of the parameters of the network for the same prediction accuracy, decreasing the complexity of the model and accelerating the detection of the vehicles. This approach yields a good image feature learning ability for PVIDNnet, and improves vehicle detection accuracy.
The dense connection structure of the convolutional layers, i.e., the Dense Connection blocks, are used for replacing the residual blocks located in the PVIDNet backbone network.
Each layer of the Dense Connection block outputs m feature maps, which represents the growth rate. The i-th layer of the block is represented by . It is concatenated as an input. The number of the input feature maps of the first layer is represented by .
The Dense Connection block presents five densely connected units, as shown in
Figure 6.
Figure 6a shows the representation of the backbone structure of the original YOLOv3. In
Figure 6b, each unit has a 1 × 1 convolutional layer represented by the grey color with label CB, which means Convolution-Batch Normalization with an activation function. Each unit also has a 3 × 3 convolutional layer represented by the blue color in
Figure 6b, in which each convolutional layer is followed by a batch normalization layer and the SRS activation function. The yellow block named cc in
Figure 6b represents the feature concatenation.
In this network, the m growth rate is set to 32. Improved transition blocks are used before each Dense Connection block, for performing the maximum pooling and convolution step. In the end, it concatenates both outputs as being the input of the next block. Thus, the overall parameters of the new network are reduced; therefore, the processing time is also decreased.
It is important to note that the Dense Connection block effectively smooths the strengthen feature propagation, the gradient vanishes, and feature reuses are facilitated. This block differentiates the received data that is added to the network and preserves it. Thus, the network knowledge is held, helping to base decisions on all feature maps of the network. This process makes the proposed system applicable in real-time scenarios.
In the experiments, the proposed PVIDNet model using the SRS activation function is compared with YOLOv3, PVIDNet using Softmax, and PVIDNet with Relu.
3.2.2. Validation with Real-Time Videos
For validation, video vehicles in real-time and in real scenes were captured. Fifty videos were recorded. The length of each collected video was 40 s, a value chosen according to related studies [
78]. The videos are 34.25 FPS and were captured with an EOS 550D camera at four different locations, under three occlusion statuses. It is important to note that the videos had all types of vehicles considered in this study: ambulances, police cars, fire trucks, buses, and regular cars.
3.2.3. Model Validation Metrics
In this work, the following validation metrics are used: accuracy, sensitivity or recall, and F-Measure. These metrics are composed of true positive (TP), false positive (FP), false negative (FN), and true negative (TN).
These metrics are defined as follows:
Precision is defined as follows:
In this work, 10-fold cross-validation is performed to obtain the metrics for the validation of the vehicle classification.
3.3. The Proposed Traffic Light Control Algorithm
After the vehicles are classified by PVIDNet, which performs faster detection based on its activation function and feature concatenation, the proposed traffic light control algorithm is applied. Thus, the outputs of the proposed deep learning algorithm are used as inputs for the traffic light control algorithm based on vehicle priorities, in order to make a decision about the traffic flow.
The proposed traffic light control algorithm works according to
Figure 7. It is formulated based on BTC [
71] but considering some adaptations regarding the priority of emergency vehicles.
Initially, in the proposed algorithm for an intelligent traffic light shown in
Figure 7, the traffic light on Road A is considered as being red. Note that the analysis is only performed on a traffic light status that is red or green, and a traffic light status of yellow is not considered in the diagram for simplicity. When the traffic light timer is completed, then the traffic light status on Road A is set to green, and the traffic light status on Road B is set to red as a regular traffic light based on the timer. When the timer is not completed, the preference or right of way of priority vehicles is verified; when a priority vehicle appears on Road A, then the traffic light on Road B is set to red, and the traffic light on Road A is set to green. It is important to note that a priority index of vehicles is followed in this work; for instance, ambulances have a higher priority than regular cars. When the priority vehicle passes the traffic light on Road A, then the flow restarts with the verification if the timer is completed. The same logic occurs with the traffic light on Road B; for this purpose, the block “
” shows a change of variables.
The traffic simulation scenarios show that different vehicles can be present together at the same time in a road intersection controlled by a traffic light. These vehicles are identified, and the priority order of each one is obtained considering the traffic light priority presented in
Table 3. When vehicles of the same class are detected on both roads, and this class is the highest priority at that moment in that road intersection, the vehicle in the road with the traffic light status of green has the right of way in terms of crossing the road.
In the proposed solution, the control traffic light gives the right of way to priority vehicles traveling on the road. Thus, the traffic light manages this automatically, and there needs to be a communication between both traffic lights. These vehicles are emergency vehicles, such as ambulances and fire trucks. Among these vehicles, the order of priority chosen in this work is as follows: ambulance, fire truck, police car, bus, and regular car, respectively.
The priority index chosen in this work is shown in
Table 3, in which 0 represents the highest priority, and 4 represents the lowest priority. The index is used to detect emergency vehicles approaching the intersection. Thus, emergency vehicles have the highest priority, while a non-emergency vehicle such as a car has the lowest priority.
When a priority vehicle is detected by the traffic light controller, its status is changed to the green phase, and the controller of the traffic light extends the duration of this phase until the priority vehicle passes through the intersection. It is worth noting that, in our experimental studies, ambulances, fire trucks, and police cars are considered as emergency vehicles, and they have the highest priorities.
The scenario chosen to be simulated is implemented in the Simulation of Urban Mobility (SUMO) [
70], which is an Open Source road traffic simulator with a realistic road topology.
5. Conclusions
In this research, a priority vehicle image detection model was studied and implemented. Methods regarding image feature extraction and function activation in DL were investigated and evaluated. In addition, a DB was created, and it is composed of five types of vehicles, considering the left, frontal, and right angles of the image capture.
For improving the detection procedure and time execution, an improved version of YOLOv3 is proposed. Additionally, the incorporation of an improved version of DenseNet reduced the parameter numbers used by PVIDNet, enhancing feature propagation and network performance. The SRS activation function presented a low processing time compared to other functions because the SRS presents a better generalization performance and a faster learning speed for the model generation; thus, the deep network training process is accelerated. Performance assessment results demonstrated that the PVIDNet model reached an accuracy higher than 0.95 in vehicle image classification as presented in
Table 16, and results are better for emergency vehicles. Furthermore, when the proposed model is validated using video sequences, the same high accuracy is reached, as presented in
Table 17. Based on the BTC, a control traffic algorithm that gives priority to emergency vehicles, such as ambulances, fire trucks, and police cars, is proposed. For the performance assessment of the control algorithm, simulation tests were performed. To this end, the SUMO tool for simulating the traffic of vehicles was used. The simulation test results showed a decrease of 50% in the average total waiting time for emergency vehicles when compared to the FIFO strategy. Moreover, a decrease of 45% in total traveling time for emergency vehicles was achieved. It is important to note that regular private cars are not negatively affected regarding the waiting time and total travel time. In the case of public transportation, represented by buses, a slight improvement in those same parameters was obtained.
The experimental results demonstrated that the proposed solution composed of the Lightweight PVIDNet and a control algorithm for intelligent traffic presented a high accuracy with a low complexity, as well as a fast image detection process, which are important features of intelligent traffic lights. Furthermore, the reduction of waiting time at traffic lights for emergency vehicles is obviously important in an emergency situation.
In future work, we intend to explore other simulation scenarios, with other road intersections and vehicular traffic models. Additionally, we intend to develop a prototype of both the proposed PVIDNet model and the traffic light controller using embedded systems.