A New Multi-Branch Convolutional Neural Network and Feature Map Extraction Method for Traffic Congestion Detection

Jiang, Shan; Feng, Yuming; Zhang, Wei; Liao, Xiaofeng; Dai, Xiangguang; Onasanya, Babatunde Oluwaseun

doi:10.3390/s24134272

Open AccessCommunication

A New Multi-Branch Convolutional Neural Network and Feature Map Extraction Method for Traffic Congestion Detection

by

Shan Jiang

^1,2,3,

Yuming Feng

^1,2,*

,

Wei Zhang

^1,2,*

,

Xiaofeng Liao

³,

Xiangguang Dai

^1,2 and

Babatunde Oluwaseun Onasanya

⁴

¹

School of Computer Science and Engineering, Chongqing Three Gorges University, Chongqing 404100, China

²

Key Laboratory of Intelligent Information Processing and Control, Chongqing Three Gorges University, Wanzhou, Chongqing 404100, China

³

College of Computer Science, Chongqing University, Chongqing 400044, China

⁴

Department of Mathematics, University of Ibadan, Ibadan 200005, Nigeria

^*

Authors to whom correspondence should be addressed.

Sensors 2024, 24(13), 4272; https://doi.org/10.3390/s24134272

Submission received: 13 May 2024 / Revised: 21 June 2024 / Accepted: 29 June 2024 / Published: 1 July 2024

(This article belongs to the Section Intelligent Sensors)

Download

Browse Figures

Versions Notes

Abstract

:

With the continuous advancement of the economy and technology, the number of cars continues to increase, and the traffic congestion problem on some key roads is becoming increasingly serious. This paper proposes a new vehicle information feature map (VIFM) method and a multi-branch convolutional neural network (MBCNN) model and applies it to the problem of traffic congestion detection based on camera image data. The aim of this study is to build a deep learning model with traffic images as input and congestion detection results as output. It aims to provide a new method for automatic detection of traffic congestion. The deep learning-based method in this article can effectively utilize the existing massive camera network in the transportation system without requiring too much investment in hardware. This study first uses an object detection model to identify vehicles in images. Then, a method for extracting a VIFM is proposed. Finally, a traffic congestion detection model based on MBCNN is constructed. This paper verifies the application effect of this method in the Chinese City Traffic Image Database (CCTRIB). Compared to other convolutional neural networks, other deep learning models, and baseline models, the method proposed in this paper yields superior results. The method in this article obtained an F1 score of 98.61% and an accuracy of 98.62%. Experimental results show that this method effectively solves the problem of traffic congestion detection and provides a powerful tool for traffic management.

Keywords:

traffic congestion detection; image data; target detection; feature map; classification model

1. Introduction

As the transportation system continues to grow, the number of motor vehicles continues to rise, resulting in severe traffic congestion in some areas of large cities [1,2]. Efficient and reasonable traffic congestion monitoring helps transportation departments manage congestion problems more effectively [3,4]. Currently, there are three main methods that can be applied to traffic congestion monitoring. The first method is to detect congestion through induction coils, which can effectively measure the traffic flow and speed on the road [5,6]. An induction coil is a device used to measure vehicle speed, commonly used in traffic control and traffic monitoring systems. Its working principle is based on the changes in induced electromagnetic fields. The induction coil consists of a coil, usually buried in the road, with a length perpendicular to the direction of vehicle travel. When a vehicle passes over the speed measurement coil, the metal parts of the vehicle (such as the wheels) will disturb the electromagnetic field generated by the induction coil. This disturbance will cause changes in the induced current within the coil. The induction coil calculates the speed of the vehicle by measuring the changes in induced current [7].

The second method is to use vehicle GPS data for congestion monitoring [8,9]. GPS equipment can provide real-time feedback on vehicle position and speed. The principle of GPS speed measurement is based on the Doppler effect. The Doppler effect refers to the change in the frequency of the received signal when the transmitting source and the receiving source are in relative motion. In GPS speed measurement, after receiving the signal transmitted by the satellite, the receiver measures the frequency of the signal and calculates its own speed relative to the satellite based on the Doppler effect. By measuring and calculating the frequencies of multiple satellite signals, GPS receivers can accurately measure their own speed [10]. By calculating the average speed of vehicles on each road section, it can be determined whether there is congestion on that road section. Although these two methods are relatively simple in detecting congestion, they require extensive hardware support. Installing induction coils on various road sections and installing GPS equipment in vehicles will consume a lot of manpower and financial resources. This article will focus on proposing a traffic congestion detection method with less hardware consumption.

The third method is to use camera data and computer vision algorithms to detect whether there is congestion on the road [11,12]. At present, a large number of cameras have been widely deployed in various sections of transportation system. Using these existing cameras for traffic congestion monitoring can reduce hardware resource requirements, but requires the application of complex algorithms for calculations. Therefore, the focus of this research is to use camera data to determine whether there is congestion in each road section. We will study image recognition models for traffic congestion detection. The input of the model is image data captured by cameras, and the output is whether there is congestion on the road section. Currently, there are three main types of research in this field. (1) The first method uses a target detection algorithm to detect the number and location of cars, trucks, and buses in the picture, and based on this, determine whether there is congestion on the road [13,14]. (2) The second method extracts traffic status features from images, including the number of vehicles and their speed, and then uses machine learning models to determine whether there is congestion in the image based on these features [15,16]. (3). The third method uses the image classification model to determine whether there is congestion in the area where the camera is located [17,18].

The target detection models used for traffic congestion detection mainly consist of the following: the Haar Cascade [15,19], You Only Look Once (YOLO) [20,21], Single-Shot MultiBox Detector (SSD) [22,23], and Mask R-Convolutional Neural Networks (R-CNN) [24,25].

Many scholars have conducted traffic-related image detection and image classification based on visual features. In terms of features, they mainly consist of bag features [26], Haar features [27], edge features [28], shape features [29], gradient histograms features [30] and CNN [31].

Computer vision models used for traffic image classification mainly consist of VGG-16 [32,33], ResNet50 [34,35], EfficientNet [36,37], and Inceptionv3 [38,39].

In the problem of traffic congestion detection based on camera data, current research mainly relies on machine learning methods based on feature extraction or image classification models based on deep learning [9,40]. These methods currently have problems such as insufficient detection accuracy and excessive model size. The research in this article is mainly based on object detection models. This article will propose a new vehicle information feature map (VIFM) for extracting traffic congestion features. Meanwhile, this article will propose a new multi-branch convolutional neural network (MBCNN) for traffic congestion detection.

In building an image-based traffic congestion recognition model, the main challenges include accurate detection of vehicles in images, reasonable feature extraction, and efficient classification model construction. The reason why this article adopts a deep learning-based approach is that deep learning has achieved great success in the image field. We apply deep learning methods to traffic congestion recognition in order to make some progress in the problem of traffic congestion recognition.

This study uses a target detection algorithm to process the images captured by the camera to automatically identify traffic elements such as vehicles in the images. Divide the picture into several small squares, and then count the number of vehicles in each small square. This article extracts the maximum value, sum value and number of small squares greater than the threshold to extract higher-level features. By extracting image features through this method, the model can learn the difference between congestion and non-congestion states, laying the foundation for subsequent congestion detection.

Next, this article proposes a new multi-branch convolutional neural network model. The three branches of the model each use a different size of convolution kernels. Finally, the outputs of the three branches are passed through a fully connected layer to determine whether there is congestion. This classification model is used in this article for traffic congestion prediction detection.

Finally, this study combines actual traffic data to train and test the proposed congestion detection method. In the testing phase, the actual traffic conditions are compared with the detection results to evaluate the F1 score and accuracy of the model. The main contribution of this paper is to propose a new method for extracting vehicle information feature maps (VIFMs) and a multi-branch convolutional neural network model (MBCNN). The model proposed in this article will achieve better results compared to existing convolutional neural networks, deep neural networks, and baseline models. The method proposed in this paper can achieve traffic congestion detection based on camera data. Due to the limitations of our experimental conditions, we can only use cameras to capture 2D images. At the same time, our method can effectively provide intelligent algorithms for massive ordinary traffic surveillance cameras without adding additional hardware costs.

2. Method

The traffic congestion detection in this article is mainly divided into three aspects: vehicle target detection, feature map extraction, and classifier detection of whether there is traffic congestion. First, this article establishes a vehicle information extraction model based on the You Only Look Once v8 (YOLOv8) model. Then, this paper proposes a feature map extraction method based on vehicle information. Finally, this article establishes a multi-branch convolutional neural network model for traffic congestion detection.

2.1. Target Detection

This article uses the most advanced YOLOv8 target detection model to detect the location and number of vehicles in each picture. The structural diagram of the YOLOv8 target detection model is shown in Figure 1. For a detailed introduction to the YOLOv8 target detection model, readers can refer to [41,42]. Due to space limitations, this article will not go into details.

2.2. Vehicle Information Feature Map (VIFM)

This paper proposes a vehicle information feature map (VIFM) method. After identifying the location of each car in the picture, the information needs to be processed to extract features as model input for the subsequent classifier. This article divides a two-dimensional image into m*m small squares with equal distance and then counts the number of vehicles in each small square, so that our feature map can be obtained. Its specific definition is as follows:

f_{i, j} = \sum_{n = 1}^{N} I (x_{n} \geq (i - 1) \times w and x_{n} < i \times w and y_{n} \geq (j - 1) \times h and y_{n} < j \times h)

(1)

where f is the calculation result of the feature map, N is the number of detected vehicles, n is the vehicle number, x_n is the x coordinate of the n-th vehicle, and y_n is the y coordinate of the n-th vehicle. w and h are the size of a small square. I is an indicator function defined as follows:

I (E) = {\begin{cases} 1, E = true \\ 0, E = false \end{cases}

(2)

where E is a logical expression. The schematic diagram is as Figure 2. The significance of this formula is that it first divides the image into several rectangular squares at equal distances and then counts how many vehicles are in each small square.

The detailed calculation process of the VIFM algorithm is as follows (Algorithm 1):

Algorithm 1: VIFM

Input: image, Target detection results, Square size m
Output: m*m feature map matrix
high ← image.shape₀
width ← image.shape₁
matrix ←

0^{m \times m}

for (cls, x, y, w, h) ∈ results do:
center_x ← x +

\frac{w}{2}

center_y ← y +

\frac{h}{2}

index_width ← int(

\frac{c e n t e r_x}{w i d t h}

m)
index_high ← int(

\frac{c e n t e r_y}{h i g h}

m)
if cls ≠’bus’ and cls ≠’car’ and cls ≠’trunck’
continue
end if
if index_width ≥ m or index_high ≥ m or index_width < 0 or index_high < 0
continue
end if
matrix[index_width,index_high] ← matrix[index_width,index_high] + 1
end for
return matrix

After obtaining the feature map, we further extract three features from the feature map, namely the total number of feature map, the maximum number of feature map, and the number of elements greater than the threshold. The calculation is as follows:

t o t a l = \sum_{i = 1}^{m} \sum_{j = 1}^{m} m a t r i x_{i, j}

(3)

m a x = \max_{i, j} m a t r i x_{i, j}

(4)

c o u n t = \sum_{i = 1}^{m} \sum_{j = 1}^{m} I (m a t r i x_{i, j} > t h r e s h)

(5)

where I is an indicator function. If the condition is true, its value is 1; otherwise it is 0. Thresh is the vehicle number threshold. In subsequent chapters, after obtaining the feature map and 3 features, this article inputs the feature map and 3 features to the classifier to identify traffic congestion.

2.3. Multi-Branch Convolutional Neural Network (MBCNN)

This paper proposes a multi-branch convolutional neural network (MBCNN) for automatic detection of traffic congestion. Since the features proposed in this article are two-dimensional matrix structures, their characteristics are similar to image data. Therefore, this paper uses convolutional neural network as the classifier. If the value is close to 1, it means it is congested, and if the value is close to 0, it means it is non-congested.

In order to capture the features of different sizes of the image, the image first passes through 3 convolution branches respectively. The convolution kernel sizes in the three branches are 1, 2, and 3 respectively. The padding is set to the same. That is to say, the size of the feature map is always m*m. First, in each of the three branches, the image passes through a convolutional layer, a ReLU layer, a convolutional layer, and a ReLU layer. Then, the features extracted from the three branches are concatenated together. The concatenated feature map passes through a dropout layer, a flattening layer, and finally a fully connected layer with an output length of 3 to obtain the vector representation of our image. The inference formula for the convolutional neural network part of the classifier is as follows:

\begin{array}{l} X_{1}^{p} = W_{1, 1}^{p} \otimes i m a g e + b_{1, 1}^{p}, X_{2}^{p} = W_{1, 2}^{p} \otimes i m a g e + b_{1, 2}^{p}, \\ X_{3}^{p} = W_{1, 3}^{p} \otimes i m a g e + b_{1, 3}^{p} \end{array}

(6)

X_{1} = R e L U (X_{1}), X_{2} = R e L U (X_{2}), X_{3} = R e L U (X_{3})

(7)

\begin{array}{l} X_{1}^{d} = \sum_{p = 1}^{P} W_{2, 1}^{p, d} \otimes X_{1}^{p} + b_{2, 1}^{d}, X_{2}^{d} = \sum_{p = 1}^{P} W_{2, 2}^{p, d} \otimes X_{2}^{p} + b_{2, 2}^{d}, \\ X_{3}^{d} = \sum_{p = 1}^{P} W_{2, 3}^{p, d} \otimes X_{3}^{p} + b_{2, 3}^{d} \end{array}

(8)

X_{1} = R e L U (X_{1}), X_{2} = R e L U (X_{2}), X_{3} = R e L U (X_{3})

(9)

X = X_{1} \oplus X_{2} \oplus X_{3}

(10)

X = Dropout(X,0.5)

(11)

X = Flatten(X)

(12)

X = R e L u (W_{3} X + b_{3})

(13)

where image is the feature map input.

\otimes

is the convolution symbol.

\oplus

represents vector concatenation. W₁ and W₂ are the convolutional kernels of the first two convolutional layers. b₁ and b₂ are the bias terms of the first two convolutional layers. ReLU is a nonlinear activation function. W₃ and b₃ are parameters of the fully connected layer. The output dimension of the last fully connected layer is 3. Dropout is used to prevent overfitting, and its dropout probability is 0.5.

In order to consider the three features of the total number of vehicles, the maximum value, and the count value greater than the threshold, the features extracted by the convolutional neural network are first spliced with this feature. Finally, after two fully connected layers, the value is the congestion confidence. The final fully connected neural network reasoning process is as follows:

X = X \oplus [t o t a l, m a x, c o u n t]

(14)

X = R e L U (W_{4} X + b_{4})

(15)

Y = S i g m o i d (W_{5} X + b_{5})

(16)

where

\oplus

represents vector concatenation. W₄ and W₅ are the parameters of two fully connected layers. b₄ and b₅ are the bias terms of the two fully connected layers. Sigmoid is the final output layer. Y > 0.5 means the picture is congested, and Y < 0.5 means the picture is non-congested. Total is the total number of vehicles in the feature map, max is the maximum value of each element in the feature map, and count is the number of elements in the feature map that are greater than the threshold.

The structure of the classifier in this article is shown in the Figure 3. Among them, input1 is the feature map input, and input2 is the total number of vehicles, the maximum value and the number of elements greater than the threshold. The output is the confidence probability that there is traffic congestion in this image.

The loss function for this model training is the cross-entropy loss function, which is defined as follows:

l o s s = - \frac{1}{b a t c h_s i z e} \sum_{j = 1}^{b a t c h_s i z e} [y_{j} \log {\hat{y}}_{j} + (1 - y_{j}) \log (1 - {\hat{y}}_{j})]

(17)

The classification model in this article is trained using the Adam optimization algorithm. Adam is a common optimization algorithm; readers can refer to [43].

2.4. Evaluation Index

This article uses F1 score as the main evaluation index for traffic congestion detection, which is as follows:

F 1 = 2 \times \frac{p r e c i s i o n \times r e c a l l}{p r e c i s i o n + r e c a l l}

(18)

The second indicator in this article is accuracy, which is as follows:

A c c u r a c y = (T P + T N) / (T P + T N + F P + F N)

(19)

where TP is the true positive example, that is, the number of pictures in that actual category that are congested and are identified as congested. TN is the true negative example, that is, the number of pictures in that actual category that are non-congested and are recognized as non-congested. FP is the false positive example, that is, the number of pictures whose actual category is congestion and are recognized as non-congestion. FN is the false negative example, that is, the number of images in that actual category that are non-congested and are identified as congested.

3. Numerical Experiments

This section will conduct numerical experiments for traffic congestion detection based on actual data. Section 3.1 introduces the data used for the numerical experiments. Section 3.2 discusses the results of object detection. Section 3.3 presents the results of feature map extraction. Section 3.4 presents the classification results. Section 3.5 compares the detection accuracy of different classification models and different object detection models. Section 3.6 shows the experimental results on other datasets. Section 3.7 analyzes time and space complexity. Section 3.8 shows the choice of hyperparameter m.

3.1. Dataset

The Chinese City Traffic Image Database (CCTRIB) is an image dataset used for road congestion status detection [44]. CCTRIB images come from traffic videos captured by multiple city key road surveillance cameras, including surveillance videos on highways, urban roads, and expressways. The frequency of video collection is once every 500 frames. The videos collected on each key road include various situations such as lighting changes, weather changes, and different imaging scales.

The dataset has a total of 9200 images, including 4600 traffic congestion images and 4600 non-congestion traffic images. The image resolution is between 480 × 320 and 1920 × 1080 pixels, which can be used for training and testing of road congestion detection algorithms. There are 8471 pictures in the training set and 729 pictures in the test set. The proportion of congestion and non-congestion on both the training set and the test set is 50%. Figure 4 is an example of the CCTRIB dataset. The upper part is an example of a congested image. The lower part is an example of a non-congested image.

The congested images include 1160 daytime images and 870 nighttime images under clear skies, 970 daytime images and 800 night images under cloudy conditions, and 390 daytime images and 410 night images in rain and mist. The non-congested images include 940 daytime images and 1160 night images under clear conditions, 890 daytime images and 890 night images under cloudy conditions, 390 daytime images and 330 night images under rainy and foggy weather conditions. Figure 5 is the distribution diagram of the CCTRIB dataset images.

3.2. Target Detection Results

This article first performs target detection on camera images based on the YOLOv8 model and extracts the location and quantity information of vehicles. This article detects vehicle targets in all images in the training set and test set, and plots the target locations and confidence levels. The effect of target detection on the congestion category dataset is as follows (Figure 6):

The detection effect of the target detection algorithm on non-congested datasets is as follows (Figure 7):

3.3. Feature Map

This article extracts a feature map based on the vehicle position information obtained by the target detection model and uses it for the final classification model. We divide the entire image into m*m squares. In this experiment, we set the parameter m = 6. The reason why we determined m = 6 is that we conducted 5-fold cross-validation on the training set, and m = 6 can produce a relatively optimal result. In Section 3.8, we will discuss the value of m in detail. Then count the number of vehicles in each square. Finally, the feature map is used as the input of the classification model. Figure 8 shows the feature map extraction results of four images. The categories of the upper two images are traffic congestion, and the categories of the lower two images are non-congestion categories.

The above two pictures are feature maps of congested pictures. The following two pictures are feature maps of non-congested pictures.

3.4. Classification Results

We use the training set data of CCTRIB to train our MBCNN model and evaluate the accuracy of the prediction results on the test set. There are 8471 pictures in the training set and 729 pictures in the test set. The overall prediction process is to first use the pre-trained YOLOv8 model to detect vehicles, then use a VIFM to extract feature maps and advanced features, and finally use the MBCNN model to predict whether there is traffic congestion in the image.

We set the size of the feature map to 6*6 and set the element size threshold thresh = 3. For the first branch, the convolution kernel size is set to 1*1, and the output channel numbers of the two convolution layers are 64 and 32 respectively. The nonlinear activation function is ReLU, and the padding is set to the same. For the second branch, the convolution kernel size is set to 2*2, and the output channel numbers of the two convolution layers are 64 and 32, respectively. The nonlinear activation function is ReLU, and the padding is set to the same. For the third branch, the convolution kernel size is set to 3*3, and the channel numbers of the two convolution layers are 64 and 32, respectively. The nonlinear activation function is ReLU, and the padding is set to the same. The dropout layer has an inactivation probability of 0.5. The output size of the first fully connected layer is 3, and the activation function is ReLU. The output size of the second fully connected layer is 12, and the activation function is ReLU. The output size of the third fully connected layer is 1, and the activation function is the sigmoid function.

This article first shows the prediction results of some data in the test set. Figure 9 shows 16 pictures predicted to be congested. Figure 10 shows 16 pictures predicted to be non-congested. As can be seen from the figure, the method in this paper can effectively predict traffic congestion. The threshold for this experiment is 0.5. If the confidence level is greater than 0.5, it indicates a higher likelihood that the image is a congested image. Otherwise, it indicates a higher likelihood that the image is a non-congested image. In the entire test set, the F1 score of this algorithm is 98.61%, and the accuracy is 98.62%.

3.5. Comparative Experiment

This article uses the CNN model as the final classification model. First, our baseline model is a model that only considers the number of vehicles, that is, using YOLOv8 to detect vehicles and then counting the number in each image. Images that are larger than the threshold are identified as showing traffic congestion, while images that are smaller than the threshold are identified as showing non-congestion. The threshold is determined using a logistic regression model [45].

This article compares this model with feed-forward neural networks [46], support vector machines [47], etc. During comparison, the feature map is first converted into a 1-dimensional vector and then input into the classification model for classification.

The number of hidden layer units of the feed-forward neural network is set to 256, the hidden layer activation function is set to the ReLU function, the number of output layer units is 1, the activation function is the sigmoid function, the binary cross-entropy loss function is used, and the optimization algorithm uses the Adam algorithm. The parameter C of the SVM classifier is set to 1, the kernel function is the rbf function, the order is set to 3, the minimum tolerated accuracy is set to 0.001, the block size is set to 200, and the decision function is ovr.

At the same time, this article also compares the classification accuracy of several large-scale deep neural networks, as shown in Table 1. We utilize a large pre-trained network to extract features directly on the original images. For the VGG16 model [48], the output is a 512-dimensional feature vector. For the Resnet50 [49] model, the output is a 2048-dimensional feature vector. For the EfficientNet b7 [50] model, the output is a 2560-dimensional feature vector. For large pre-trained networks, the final classifier settings are thus: the number of hidden layer units is 128, the activation function is the ReLU function, the number of output layer units is 1, and the activation function is the sigmoid function. It is worth noting that it is still relatively rare to use convolutional neural networks for traffic congestion detection. The convolutional neural network references cited in this article have applications in other fields. These models are mainly used for comparative studies.

In order to choose the optimal target detection model, this article also compares the traffic congestion detection results obtained when using YOLOv8 and YOLOv5 [20,21], SSD [22,23] and the Haar Cascade models [19] (Table 2).

In order to more intuitively display the detection effects of the four target detection models, we draw the detection results of a picture in the dataset, and the results are in Figure 11. It can be seen that the YOLOv8 and YOLOv5 models achieve relatively better results.

3.6. Experiments on Other Datasets

To further verify the generalization performance of the proposed method, we conducted inference experiments on a subset of data from the traffic net dataset [51]. The traffic net dataset has a total of four categories of data, including accidents, dense traffic, fires, and sparse traffic. Each category has 1100 images.

We trained our model based on the data of dense traffic and sparse traffic categories in the traffic net dataset and validate the effectiveness of various methods on the test set. The hyperparameter calibration of these models is the same as that in Section 3.5. The results are shown in Table 3. It can be seen that the model in this paper achieves better results than the baseline model, machine learning model, and pre-trained convolutional neural network model. It is worth noting that due to the different installation locations of some cameras in the traffic net dataset compared to CCTRIB, and because the training set of the traffic-net dataset is only 1800 images, the generalization performance is slightly worse, which is normal.

We extracted 10 images each from the dense and sparse traffic categories, for a total of 20 images, and used the MBCNN model for inference. The results are as follows (Figure 12):

As can be seen from the figure, the model in this article can effectively identify congested and non-congested images in the traffic net dataset.

3.7. Time and Space Complexity Analysis

In order to more comprehensively analyze the performance of this algorithm, this article compares the time complexity and space complexity of the algorithm. The main frequency of our computer is 1.9 HZ. The memory size is 16 GB, and the operating system is 64-bit. The analysis results are shown in Table 4. It can be seen that the time and space complexity of this method can well meet the needs of the current computing power.

3.8. The Influence of Hyperparameter m on the Experimental Results

To determine the impact of the feature map size m on the experimental classification accuracy, we used a 5-fold cross-validation method with a dataset. That is, 20% of the data from the training set were used as the validation set. This article tested the classification performance on the validation set when m was 4, 5, 6, and 7. The results are shown in the following table. From Table 5, it can be seen that the optimal value of m is 6.

4. Conclusions

This article studies an automatic detection method of traffic congestion based on image data and proposes a new traffic congestion feature extraction method VIFM and a new traffic congestion classifier MBCNN, which can provide effective data support for traffic management, reduce system operating costs, and provide new methods for automatic detection.

This method is based on the YOLOv8 model to detect vehicle information in images. Based on the VIFM method, it extracts feature maps and high-level features of vehicle information. This method uses MBCNN to identify whether the image is a traffic congestion image. The method proposed in this paper can effectively utilize the existing massive network of cameras in the traffic system to automatically detect traffic congestion without increasing hardware costs.

In Section 3, the following are discussed. (1) This article analyzes the detection results of YOLOv8. (2) This article analyzes the results of VIFM extracting feature maps. (3) This article analyzes the results of MBCNN for congestion recognition. (4) This article compares the congestion recognition results of different classifiers and different object detection models. (5) This article analyzes the spatial and temporal complexity of our method.

This article verifies the effectiveness of the proposed method on the CCTRIB dataset. (1) This article verifies the effectiveness of the VIFM method and the MBCNN classifier. The classification model in this article achieves an F1 score of 98.61% and an accuracy of 98.62% on the test set. (2) This article compares the detection accuracy of this model with other object detection models, other convolutional neural network models, other deep learning models, and baseline models (see Table 1, Table 2 and Table 3). Numerical experiments show that the proposed method achieves good results.

In future research, we will study end-to-end multi-stage traffic congestion recognition methods which integrate target detection, feature map extraction, and congestion classification into an end-to-end model to further improve the accuracy and computational efficiency of traffic congestion recognition.

Author Contributions

Conceptualization, S.J. and Y.F.; methodology, S.J.; validation, S.J.; formal analysis, S.J. and Y.F.; data curation, S.J.; writing-original draft preparation, S.J.; writing-review and editing, S.J., Y.F., W.Z., X.D. and B.O.O.; visualization, S.J.; supervision, Y.F., W.Z. and X.L.; project administration, Y.F. and W.Z.; funding acquisition, Y.F. and W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Science and Technology Research Program of Chongqing Municipal Education Commission (KJZD-M202201204, KJZD-K202201205) and the Science and Technology Innovation Smart Agriculture Project of Science and Technology Department, Wanzhou District of Chongqing (Grant No. 2022-17), the Rural Revitalization Special Project of Chongqing Science and Technology Bureau (No. CSTB2023TIAD-ZXX0017), the Foundation of Intelligent Ecotourism Subject Group of Chongqing Three Gorges University (No. zhlv20221028), Natural Science Foundation of Chongqing (Grant No. 2024NSCQ-LZX0121).

Data Availability Statement

The data that support the findings of this study are available from OpenITS Open Data, but restrictions apply to the availability of these data, which were used under license for the current study, and so the data are not publicly available. Data are however available from the authors upon reasonable request and with the permission of OpenITS Open Data.

Acknowledgments

The authors gratefully acknowledge OpenITS Open Data for their effort in making the data available.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kumar, N.; Raubal, M. Applications of deep learning in congestion detection, prediction and alleviation: A survey. Transp. Res. Part C Emerg. Technol. 2021, 133, 103432. [Google Scholar] [CrossRef]
Li, Z.; Yu, H.; Zhang, G.; Dong, S.; Xu, C.Z. Network-wide traffic signal control optimization using a multi-agent deep reinforcement learning. Transp. Res. Part C Emerg. Technol. 2021, 125, 103059. [Google Scholar] [CrossRef]
Jiang, S.; Feng, Y.; Liao, X.; Wu, H.; Liu, J.; Onasanya, B.O. A Novel Spatiotemporal Periodic Polynomial Model for Predicting Road Traffic Speed. Symmetry 2024, 16, 537. [Google Scholar] [CrossRef]
Sadollah, A.; Gao, K.; Zhang, Y.; Zhang, Y.; Su, R. Management of traffic congestion in adaptive traffic signals using a novel classification-based approach. Eng. Optim. 2019, 51, 1509–1528. [Google Scholar] [CrossRef]
Jeng, S.T.; Chu, L. A high-definition traffic performance monitoring system with the inductive loop detector signature technology. In Proceedings of the 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), Qingdao, China, 8–11 October 2014; pp. 1820–1825. [Google Scholar]
Tasgaonkar, P.P.; Garg, R.D.; Garg, P.K. Vehicle detection and traffic estimation with sensors technologies for intelligent transportation systems. Sens. Imaging 2020, 21, 29. [Google Scholar] [CrossRef]
Tong, L.; Li, Z. Study on the road traffic survey system based on micro-ferromagnetic induction coil sensor. Sens. Transducers 2014, 170, 73. [Google Scholar]
Thianniwet, T.; Phosaard, S.; Pattara-Atikom, W. Classification of road traffic congestion levels from GPS data using a decision tree algorithm and sliding windows. Proc. World Congr. Eng. 2009, 1, 1–3. [Google Scholar]
Yong-chuan, Z.; Xiao-qing, Z.; Zhen-ting, C. Traffic congestion detection based on GPS floating-car data. Procedia Eng. 2011, 15, 5541–5546. [Google Scholar] [CrossRef]
Keskin, M.; Akkamis, M.; Sekerli, Y.E. An overview of GNSS and GPS based velocity measurement in comparison to other techniques. In Proceedings of the International Conference on Energy Research, Alanya, Turkey, 31 October–2 November 2018; pp. 392–404. [Google Scholar]
Cui, H.; Yuan, G.; Liu, N.; Xu, M.; Song, H. Convolutional neural network for recognizing highway traffic congestion. J. Intell. Transp. Syst. 2020, 24, 279–289. [Google Scholar] [CrossRef]
Calderoni, L.; Maio, D.; Rovis, S. Deploying a network of smart cameras for traffic monitoring on a “city kernel”. Expert Syst. Appl. 2014, 41, 502–507. [Google Scholar] [CrossRef]
Iftikhar, S.; Asim, M.; Zhang, Z.; Muthanna, A.; Chen, J.; El-Affendi, M.; Abd El-Latif, A.A. Target detection and recognition for traffic congestion in smart cities using deep learning-enabled UAVs: A review and analysis. Appl. Sci. 2023, 13, 3995. [Google Scholar] [CrossRef]
Liu, X.; Gao, W.; Feng, D.; Gao, X. Abnormal traffic congestion recognition based on video analysis. In Proceedings of the 2020 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), Shenzhen, China, 6–8 August 2020; pp. 39–42. [Google Scholar]
Soo, S. Object Detection Using Haar-Cascade Classifier; Institute of Computer Science, University of Tartu: Tartu, Estonia, 2014; Volume 2, pp. 1–12. [Google Scholar]
Saeedmanesh, M.; Kouvelas, A.; Geroliminis, N. An extended Kalman filter approach for real-time state estimation in multi-region MFD urban networks. Transp. Res. Part C Emerg. Technol. 2021, 132, 103384. [Google Scholar] [CrossRef]
Chakraborty, P.; Adu-Gyamfi, Y.O.; Poddar, S.; Ahsani, V.; Sharma, A.; Sarkar, S. Traffic congestion detection from camera images using deep convolution neural networks. Transp. Res. Rec. 2018, 2672, 222–231. [Google Scholar] [CrossRef]
Lam, C.T.; Gao, H.; Ng, B. A real-time traffic congestion detection system using on-line images. In Proceedings of the 2017 IEEE 17th International Conference on Communication Technology (ICCT), Chengdu, China, 27–30 October 2017; pp. 1548–1552. [Google Scholar]
Aarthi, V. Prototype Design of Intelligent Traffic Signal Control using Haar Cascade Classifier. In Proceedings of the 2021 Sixth International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), Chennai, India, 25–27 March 2021; pp. 260–264. [Google Scholar]
Lin, J.P.; Sun, M.T. A YOLO-based traffic counting system. In Proceedings of the 2018 Conference on Technologies and Applications of Artificial Intelligence (TAAI), Taichung, Taiwan, 30 November–December 2018; pp. 82–85. [Google Scholar]
Al-qaness, M.A.; Abbasi, A.A.; Fan, H.; Ibrahim, R.A.; Alsamhi, S.H.; Hawbani, A. An improved YOLO-based road traffic monitoring system. Computing 2021, 103, 211–230. [Google Scholar] [CrossRef]
Biswas, D.; Su, H.; Wang, C.; Stevanovic, A.; Wang, W. An automatic traffic density estimation using Single Shot Detection (SSD) and MobileNet-SSD. Phys. Chem. Earth Parts A/B/C 2019, 110, 176–184. [Google Scholar] [CrossRef]
You, S.; Bi, Q.; Ji, Y.; Liu, S.; Feng, Y.; Wu, F. Traffic sign detection method based on improved SSD. Information 2020, 11, 475. [Google Scholar] [CrossRef]
Zuo, Z.; Yu, K.; Zhou, Q.; Wang, X.; Li, T. Traffic signs detection based on faster r-cnn. In Proceedings of the 2017 IEEE 37th International Conference on Distributed Computing Systems Workshops (ICDCSW), Atlanta, GA, USA, 5–8 June 2017; pp. 286–288. [Google Scholar]
Luo, J.Q.; Fang, H.S.; Shao, F.M.; Zhong, Y.; Hua, X. Multi-scale traffic vehicle detection based on faster R–CNN with NAS optimization and feature enrichment. Def. Technol. 2021, 17, 1542–1554. [Google Scholar] [CrossRef]
Nguyen, H.N.; Krishnakumari, P.; Vu, H.L.; Van Lint, H. Traffic congestion pattern classification using multi-class svm. In Proceedings of the 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), Rio de Janeiro, Brazil, 1–4 November 2016; pp. 1059–1064. [Google Scholar]
Khalifa, A.B.; Alouani, I.; Mahjoub, M.A.; Amara, N.E.B. Pedestrian detection using a moving camera: A novel framework for foreground detection. Cogn. Syst. Res. 2020, 60, 77–96. [Google Scholar] [CrossRef]
Wu, B.; Nevatia, R. Detection of multiple, partially occluded humans in a single image by bayesian combination of edgelet part detectors. In Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05), Beijing, China, 17–21 October 2005; Volume 1, pp. 90–97. [Google Scholar]
Xie, Z.; Yang, R.; Guan, W.; Niu, J.; Wang, Y. A Novel Descriptor for Pedestrian Detection Based on Multi-layer Feature Fusion. In Proceedings of the 2020 IEEE International Conference on Real-Time Computing and Robotics (RCAR), Asahikawa, Japan, 28–29 September 2020; pp. 146–151. [Google Scholar]
Patel, C.I.; Labana, D.; Pandya, S.; Modi, K.; Ghayvat, H.; Awais, M. Histogram of oriented gradient-based fusion of features for human action recognition in action video sequences. Sensors 2020, 20, 7299. [Google Scholar] [CrossRef]
Wali, S.B.; Abdullah, M.A.; Hannan, M.A.; Hussain, A.; Samad, S.A.; Ker, P.J.; Mansor, M.B. Vision-based traffic sign detection and recognition systems: Current trends and challenges. Sensors 2019, 19, 2093. [Google Scholar] [CrossRef]
Singh, I.; Singh, S.K.; Kumar, S.; Aggarwal, K. Dropout-VGG based convolutional neural network for traffic sign categorization. In Congress on Intelligent Systems: Proceedings of CIS; Springer Nature: Singapore, 2022; pp. 247–261. [Google Scholar]
Boudissa, M.; Kawanaka, H.; Wakabayashi, T. Traffic Landmark Quality Evaluation Using Efficient VGG-16 model. In Proceedings of the 2022 Joint 12th International Conference on Soft Computing and Intelligent Systems and 23rd International Symposium on Advanced Intelligent Systems (SCIS&ISIS), Ise, Japan, 29 November–2 December 2022; pp. 1–5. [Google Scholar]
Prawinsankar, D.; Gunasekaran, M.; Gopalakrishnan, B.; Purusothaman, P. Traffic Congession Detection through Modified Resnet50 and Prediction of Traffic using Clustering. In Proceedings of the 2021 Smart Technologies, Communication and Robotics (STCR), Sathyamangalam, India, 9–10 October 2021; pp. 1–5. [Google Scholar]
Wang, Y.; Zhao, Z.; He, J.; Zhu, Y.; Wei, X. A method of vehicle flow training and detection based on ResNet50 with CenterNet method. In Proceedings of the 2021 International Conference on Communications, Information System and Computer Engineering (CISCE), Beijing, China, 14–16 May 2021; pp. 335–339. [Google Scholar]
Zhang, S.; Bu, Y.; Chen, B.; Lu, X. Transfer learning for encrypted malicious traffic detection based on efficientnet. In Proceedings of the 2021 3rd International Conference on Advances in Computer Technology, Information Science and Communication (CTISC), Shanghai, China, 23–25 April 2021; pp. 72–76. [Google Scholar]
Koonce, B.; Koonce, B.E. Convolutional Neural Networks with Swift for Tensorflow: Image Recognition and Dataset Categorization; Apress: New York, NY, USA, 2021; pp. 109–123. [Google Scholar]
Kheder, M.Q.; Mohammed, A.A. Transfer Learning Based Traffic Light Detection and Recognition Using CNN Inception-V3 Model. Iraqi J. Sci. 2023, 64, 6258–6275. [Google Scholar] [CrossRef]
Lin, C.; Li, L.; Luo, W.; Wang, K.C.; Guo, J. Transfer learning based traffic sign recognition using inception-v3 model. Periodica Polytechnica Transp. Eng. 2019, 47, 242–250. [Google Scholar] [CrossRef]
Impedovo, D.; Balducci, F.; Dentamaro, V.; Pirlo, G. Vehicular traffic congestion classification by visual features and deep learning approaches: A comparison. Sensors 2019, 19, 5213. [Google Scholar] [CrossRef] [PubMed]
Talaat, F.M.; ZainEldin, H. An improved fire detection approach based on YOLO-v8 for smart cities. Neural Comput. Appl. 2023, 35, 20939–20954. [Google Scholar] [CrossRef]
Sohan, M.; Sai Ram, T.; Reddy, R.; Venkata, C. A Review on YOLOv8 and Its Advancements. In Proceedings of the International Conference on Data Intelligence and Cognitive Informatics, Tirunelveli, India, 27–28 June 2023; Springer: Singapore, 2024; pp. 529–545. [Google Scholar]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Li, X.; OpenITS Org. OpenData V15.0-Chinese City Traffic Image Database (CCTRIB). 2022. Available online: http://www.openits.cn/openData4/824.jhtml (accessed on 1 December 2023).
LaValley, M.P. Logistic regression. Circulation 2018, 117, 2395–2399. [Google Scholar] [CrossRef]
Aldakheel, F.; Satari, R.; Wriggers, P. Feed-forward neural networks for failure mechanics problems. Appl. Sci. 2021, 11, 6483. [Google Scholar] [CrossRef]
Xu, L.; Wang, X.; Bai, L.; Xiao, J.; Liu, Q.; Chen, E.; Jiang, X.; Luo, B. Probabilistic SVM classifier ensemble selection based on GMDH-type neural network. Pattern Recognit. 2020, 106, 107373. [Google Scholar] [CrossRef]
Theckedath, D.; Sedamkar, R.R. Detecting affect states using VGG16, ResNet50 and SE-ResNet50 networks. SN Comput. Sci. 2020, 1, 79. [Google Scholar] [CrossRef]
Hossain, M.B.; Iqbal, S.H.S.; Islam, M.M.; Akhtar, M.N.; Sarker, I.H. Transfer learning with fine-tuned deep CNN ResNet50 model for classifying COVID-19 from chest X-ray images. Inform. Med. Unlocked 2022, 30, 100916. [Google Scholar] [CrossRef]
Reza, A.W.; Hasan, M.M.; Nowrin, N.; Shibly, M.A. Pre-trained deep learning models in automatic COVID-19 diagnosis. Indones. J. Electr. Eng. Comput. Sci. 2021, 22, 1540–1547. [Google Scholar] [CrossRef]
Olafenwa, M. Traffic Net Dataset. Available online: https://github.com/OlafenwaMoses/Traffic-Net (accessed on 10 May 2024).

Figure 1. Simplified structure diagram of YOLOv8 model [39].

Figure 2. Schematic diagram of feature map extraction.

Figure 3. Classifier structure diagram based on convolutional neural network.

Figure 4. Examples of CCTRIB dataset images.

Figure 5. Image distribution of the CCTRIB dataset.

Figure 6. Object detection results of images from the category “congested”. Each picture in the figure is a picture from the congestion category randomly selected from the dataset.

Figure 7. Object detection results from images with the category “non-congested”. Each picture in the figure is a non-congested picture randomly selected from the dataset.

Figure 8. Camera image feature map extraction results.

Figure 9. Traffic congestion prediction results. The confidence level is close to 1, which represents traffic congestion.

Figure 10. Traffic congestion prediction results. The confidence level is close to 0, which represents non-traffic congestion.

Figure 11. Detection results of four target detection models.

Figure 12. The inference results of the method on the traffic net dataset. The text in each image is the prediction category, and the number is the prediction confidence.

Table 1. Comparison between the classification model of this article and other classification models.

Model	F1 (%)	Accuracy (%)
Baseline [45]	85.23	85.45
MBCNN (ours)	98.61	98.62
FNN [46]	97.04	97.11
SVM [47]	98.17	97.80
CNN(VGG16) [48]	98.12	98.08
CNN(Resnet50) [49]	97.17	97.11
CNN(EfficientNe_b7) [50]	95.25	95.19

Table 2. Comparison of target detection models.

	F1 (%)	Accuracy (%)
Haar Cascade [19]	71.54	75.44
YOLOv8 [20]	98.61	98.62
YOLOv5 [21]	95.38	95.47
SSD [22]	72.49	76.26

Table 3. Comparison of prediction results on the traffic net dataset.

Model	F1 (%)	Accuracy (%)
Baseline	75.68	77.5
MBCNN (ours)	85.38	87.25
FNN	83.05	85.25
SVM	82.35	85
CNN(VGG16)	83.74	83.25
CNN(Resnet 50)	79.09	79.25
CNN(EfficientNe_b7)	77.51	73.75

Table 4. Time and space complexity analysis results.

Attributes	Average Value
YOLOv8 inference time	790 ms
Feature map extraction time	negligible
Classifier inference time	0.13 ms
Object detection model size	133 MB
Classification model size	62 KB

Table 5. Results of different hyperparameters m.

m Value	F1 (%)	Accuracy (%)
4	92.15	95.91
5	98.17	98.07
6	98.29	98.64
7	94.33	97.08

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jiang, S.; Feng, Y.; Zhang, W.; Liao, X.; Dai, X.; Onasanya, B.O. A New Multi-Branch Convolutional Neural Network and Feature Map Extraction Method for Traffic Congestion Detection. Sensors 2024, 24, 4272. https://doi.org/10.3390/s24134272

AMA Style

Jiang S, Feng Y, Zhang W, Liao X, Dai X, Onasanya BO. A New Multi-Branch Convolutional Neural Network and Feature Map Extraction Method for Traffic Congestion Detection. Sensors. 2024; 24(13):4272. https://doi.org/10.3390/s24134272

Chicago/Turabian Style

Jiang, Shan, Yuming Feng, Wei Zhang, Xiaofeng Liao, Xiangguang Dai, and Babatunde Oluwaseun Onasanya. 2024. "A New Multi-Branch Convolutional Neural Network and Feature Map Extraction Method for Traffic Congestion Detection" Sensors 24, no. 13: 4272. https://doi.org/10.3390/s24134272

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A New Multi-Branch Convolutional Neural Network and Feature Map Extraction Method for Traffic Congestion Detection

Abstract

1. Introduction

2. Method

2.1. Target Detection

2.2. Vehicle Information Feature Map (VIFM)

2.3. Multi-Branch Convolutional Neural Network (MBCNN)

2.4. Evaluation Index

3. Numerical Experiments

3.1. Dataset

3.2. Target Detection Results

3.3. Feature Map

3.4. Classification Results

3.5. Comparative Experiment

3.6. Experiments on Other Datasets

3.7. Time and Space Complexity Analysis

3.8. The Influence of Hyperparameter m on the Experimental Results

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI