Efficacy Evaluation of You Only Learn One Representation (YOLOR) Algorithm in Detecting, Tracking, and Counting Vehicular Traffic in Real-World Scenarios, the Case of Morelia México: An Artificial Intelligence Approach

Guzmán-Torres, José A.; Domínguez-Mota, Francisco J.; Tinoco-Guerrero, Gerardo; García-Chiquito, Maybelin C.; Tinoco-Ruíz, José G.

doi:10.3390/ai5030077

Open AccessArticle

Efficacy Evaluation of You Only Learn One Representation (YOLOR) Algorithm in Detecting, Tracking, and Counting Vehicular Traffic in Real-World Scenarios, the Case of Morelia México: An Artificial Intelligence Approach

by

José A. Guzmán-Torres

^1,*

,

Francisco J. Domínguez-Mota

^1,2,†

,

Gerardo Tinoco-Guerrero

^1,†

,

Maybelin C. García-Chiquito

^1,†

and

José G. Tinoco-Ruíz

^2,†

¹

Civil Engineering Faculty, Universidad Michoacana de San Nicolás de Hidalgo, Morelia 58030, Michoacán, Mexico

²

Faculty of Physics and Mathematics, Universidad Michoacana de San Nicolás de Hidalgo, Morelia 58030, Michoacán, Mexico

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

AI 2024, 5(3), 1594-1613; https://doi.org/10.3390/ai5030077

Submission received: 1 August 2024 / Revised: 28 August 2024 / Accepted: 29 August 2024 / Published: 4 September 2024

(This article belongs to the Section AI Systems: Theory and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

This research explores the efficacy of the YOLOR (You Only Learn One Representation) algorithm integrated with the Deep Sort algorithm for real-time vehicle detection, classification, and counting in Morelia, Mexico. The study aims to enhance traffic monitoring and management by leveraging advanced deep learning techniques. The methodology involves deploying the YOLOR model at six key monitoring stations, with varying confidence levels and pre-trained weights, to evaluate its performance across diverse traffic conditions. The results demonstrate that the model is effective compared to other approaches in classifying multiple vehicle types. The combination of YOLOR and Deep Sort proves effective in tracking vehicles and distinguishing between different types, providing valuable data for optimizing traffic flow and infrastructure planning. This innovative approach offers a scalable and precise solution for intelligent traffic management, setting new methodologies for urban traffic monitoring systems.

Keywords:

traffic monitoring; YOLOR; artificial intelligence; vehicle counting; smart cities; computer vision

1. Introduction

In urban environments, regardless of their size, mobility remains a central issue for the progress of any nation or region [1]. Efficient mobility and effective city traffic management are crucial for sustaining economic activities and minimizing the unfortunate fatalities from traffic accidents [2,3].

In Mexico, a developing country, towns and cities are categorized based on their population sizes into small, medium, and large. These cities are experiencing, in some cases, rapid growth, underscoring the urgent need for the development of new roadways, road expansions, maintenance of the actual roadways, and the construction of advanced roadway infrastructure to support and manage the increasing number of vehicles.

A critical aspect of the analysis is urban traffic management, which significantly impacts the developmental trajectory of urbanizing cities [4,5].

Urban traffic management involves substantial challenges, such as in the case of Morelia, Mexico: a fast-growing city. This city is considered medium-sized with rapid population growth and increased vehicle ownership, resulting in complete congestion on roadways.

As cities expand, the vehicular infrastructure often fails to keep pace with the rising number of vehicles, leading to traffic bottlenecks, delays, and reduced overall transportation system efficiency [6]. The challenges mentioned above in touristic cities like Morelia are stretched due to its cultural richness and historical significance [7], which attract residents and tourists, further straining the transportation networks.

The urbanization process in cities like Morelia introduces complex traffic dynamics influenced by various factors such as land use patterns, economic activities, and social behaviors [8]. The absence of comprehensive urban planning strategies and adequate transportation infrastructure worsens these complexities, leading to inefficient traffic flow and safety risks, underscoring the multi-faceted nature of the issue [9].

To address this problem, it is necessary to integrate innovative solutions with advanced technologies, data analytics, and holistic urban planning approaches [10] to effectively manage traffic and ensure sustainable mobility in rapidly urbanizing cities like Morelia. One of the main tasks for making this a reality is correctly managing the number of vehicles traveling on a particular road. In developing countries like Mexico, traditional traffic management methods are predominantly used to regulate the traffic system. These conventional systems often depend on static obsolete infrastructure and manual vehicle counting, which have several limitations in addressing the dynamic nature of urban traffic, leading to subjective and inaccurate assessments. One of these examples is the traditional counter that accounts for the number of vehicles that travel on a roadway. Figure 1 shows a traditional pneumatic counter for counting the number of vehicles circulating on the road.

These traditional methodologies are not accurate and require complete security and constant monitoring of the device because, usually, it might be stolen. These approaches consume considerable human and economic resources. The essence of this type of study with this kind of device is counting the number of repetitions recorded by the counter, which is then converted into the number of vehicles. This transformation might be inefficient and subjective because the counter does not distinguish what kind of vehicle has crossed on the pneumatic bands. Figure 2 demonstrates this affirmation, where it is possible to notice that the counter registers the crossing of two vehicles.

However, Figure 2 depicts the apparent need for more accurate and efficient methods for counting vehicles because, in this case, two different types of vehicles will cross the counter, and the damage that each vehicle will produce to the pavement surface will be completely different.

Another area for improvement in this perspective is that traditional approaches typically lack real-time data and predictive capabilities, making it difficult to anticipate and alleviate traffic congestion effectively [11]. Various technological approaches exist to mitigate these limitations, and artificial intelligence (AI) is one of the most innovative solutions.

This research investigates the performance of a complex deep learning (DL) algorithm to enhance the efficiency of vehicle-counting methods in Morelia and potentially extend this approach to other zones across the country. The aim of utilizing DL in this context is to address the current challenges of urban traffic management by leveraging the power of advanced DL algorithms to analyze large volumes of traffic data [12]. For instance, adaptive traffic light control systems powered by DL are being developed to optimize traffic flow in real-time, highlighting the growing role of AI in traffic management [13].

In recent years, DL techniques have been successfully applied in various fields of civil engineering [14], from predicting the behavior of construction materials such as concrete with high accuracy to detecting and classifying damage in concrete and asphalt structures using complex convolutional neural networks [15,16,17,18].

This research focuses on adapting and testing a precise and scalable solution for the automatic detection, classification, and counting of vehicles in real-time from video recordings captured at selected monitoring stations throughout the city. This approach might provide insights into traffic patterns, identify congestion hotspots, and optimize traffic flow in Morelia. The complex architecture analyzed for this purpose is the YOLOR (You Only Learn One Representation) algorithm in conjunction with the Deep Sort algorithm. The YOLOR algorithm was tested at six monitoring stations in inference mode to explore the application of a computer vision system as a potential solution for building intelligent traffic monitoring systems.

The analysis includes varying confidence levels during the inference mode to determine the accuracy of vehicle-type classification. Additionally, different pre-trained weights (representing various levels of model complexity) were used during the video record analysis stage, ranging from the simplest to the most complex architectures. Consequently, the research evaluates the model’s performance in detecting, classifying, and counting vehicles under specific scenarios and conditions presented by Morelia City.

With this research, one of the most complex and recent state-of-the-art computer vision model architectures is tested under various conditions to evaluate its efficacy in counting vehicle tasks in real-world scenarios.

2. The AI Approach in the Traffic Management

As mentioned, advanced technologies need to be integrated into traffic-monitoring systems. Past research has outperformed the rising challenges due to the increasing number of vehicles, which complicates traffic dynamics and management [19]. Notable studies such as “A Real-Time Vehicle Counting, Speed Estimation, and Classification System Based on Virtual Detection Zone and YOLO” have contributed to the understanding and development of real-time vehicle detection and classification systems using complex algorithms like YOLO [20]. Recent advancements, such as the fine-tuning of the YOLO-v5 architecture, have significantly improved vehicle detection accuracy in complex traffic environments, demonstrating the potential of DL in real-time traffic monitoring [21]. These systems have demonstrated substantial improvements in traffic monitoring efficiency.

The integration of convolutional neural networks with YOLO has led to enhanced accuracy in vehicle detection, facilitating more efficient traffic monitoring systems [22]. Muhammad Azhad and Fadhlan Hafizhelmi have demonstrated the integration of YOLO and Deep Sort for vehicle detection and tracking, noting enhancements with the use of YOLOv4 for real-time applications in traffic management. Their research has achieved state-of-the-art results, supporting the effectiveness of combining DL algorithms and video surveillance technologies to enhance vehicle-counting and -tracking capabilities [23]. Similarly, Al-Qaness et al. have presented an improved YOLO-based vehicle detection system for road traffic monitoring, showcasing enhanced detection and classification performance through training on diverse datasets and testing real-world traffic video sequences [24]. This approach provides a solid foundation for intelligent traffic management solutions.

Another study titled “Towards Real-time Traffic Flow Estimation using YOLO and SORT from Surveillance Video Footage” highlights the potential of utilizing surveillance video footage alongside computer vision techniques to accurately and efficiently estimate traffic flow [25]. This work effectively integrates the YOLOv4 and SORT algorithms to classify and track vehicles moving in various directions.

In another significant contribution, the study “Detection and tracking different types of cars with YOLO model combination and Deep Sort algorithm based on computer vision of traffic controlling” developed a traffic monitoring and control system using a combination of YOLOv4 and Deep Sort algorithms to effectively detect, track, and classify multiple vehicle types from CCTV footage, achieving a detection accuracy of 87.98% with mean Average Precision (mAP) [26]. In similar approaches, Azimjonov and Özmen enhanced YOLO-based real-time vehicle detection and tracking, improving classification accuracy for highway traffic monitoring by integrating classifiers, thereby boosting performance from 57% to 95.45%.

Lin and Jhang performed an intelligent traffic-monitoring system that integrates YOLO and convolutional fuzzy neural networks for real-time vehicle classification and counting, demonstrating superior accuracy and performance across several datasets [27]. Abbasi, Shahraki, and Taherkordi comprehensively reviewed the deployment of DL for Network Traffic Monitoring and Analysis (NTMA), emphasizing its efficacy in managing complex network behaviors and significant data challenges [28]. Zhu et al. presented the MME-YOLO model, an innovative multi-sensor and multi-level enhanced convolutional network for robust vehicle detection in traffic surveillance, significantly improving detection performance under several conditions [29]. DL algorithms have also been successfully applied to dynamic traffic signal control and vehicle counting, proving their adaptability and effectiveness in real-world traffic management as was emphasized by Modi et al. in [30].

As the state of the art stands, integrating AI approaches in traffic monitoring tasks has shown promising results in addressing the challenges cities face with increasing vehicular populations. These studies highlight the limitations of traditional traffic management methods and the necessity for more advanced solutions. Lin et al. have integrated convolutional fuzzy neural networks with YOLO, achieving superior accuracy in vehicle classification and counting, which is crucial for intelligent traffic monitoring systems [31].

The current work extends the previous research presented in [32], introducing a comprehensive analysis of the YOLOR and Deep Sort algorithms across multiple monitoring stations with varying traffic, lighting, weather conditions, and vehicle classes. Unlike the aforementioned work, which focused primarily on initial testing at two stations, this research delves into the performance variations under different confidence levels and computational complexities. Additionally, the current study incorporates a novel application of algorithmic improvements, such as optimizing model configurations and including more complex vehicle classifications. These enhancements contribute to a more robust and scalable traffic monitoring solution, demonstrating significant advancements over the initial findings presented in previous studies.

In nations like Mexico, vehicle-counting tasks rely on traditional methods such as manual vehicle gauging and pressure sensors to measure vehicle quantities and estimate vehicular composition. These methods generally involve monitoring traffic flow at specific roadway points to ascertain daily traffic volumes. However, traditional vehicle gauging methods need to be improved, particularly regarding accuracy and reliability.

This research aims to test the performance of YOLOR and Deep Sort algorithms in vehicle-counting tasks. With this, it will be possible to propose a different way to optimize vehicle traffic registration and estimate the Annual Average Daily Traffic (AADT). Also, this research aims to address crucial gaps in the existing methodologies related to traffic management in Morelia, Mexico.

3. Methodology

3.1. YOLOR Algorithm

The YOLOR algorithm was selected for its distinctive capability to provide a unified representation that integrates both explicit and implicit knowledge [33], which is crucial for addressing the complexities inherent in traffic analysis in cities like the case of Morelia. By harnessing both types of knowledge, YOLOR can capture the intricate dynamics of traffic patterns, including interactions among vehicles, pedestrians, and environmental factors. This integrated representation facilitates more accurate and robust predictions.

A significant advantage of YOLOR lies in its flexibility and scalability across various tasks [34]. Its formulation, which combines explicit and implicit errors, enables YOLOR to adapt to a wide range of traffic-related tasks such as vehicle detection, classification, and traffic flow analysis. This versatility is essential for tackling the multi-faceted challenges present in Morelia’s traffic conditions, where multiple aspects of traffic management need to be addressed nowadays.

The YOLOR architecture allows it to learn and generalize from diverse data sources, making it well suited for complex urban environments [35,36]. Using DL techniques, YOLOR can process and analyze large volumes of traffic data in real-time, identifying patterns and anomalies that traditional methods might discard. This real-time processing capability is critical for dynamic traffic management, where immediate insights can lead to effective interventions.

The algorithm’s scalability means it can be deployed across different scales of traffic monitoring, from minor intersections to large urban networks. This scalability is achieved through its modular design, which allows for adding or removing components based on the specific requirements of the monitoring task. As traffic conditions in Morelia evolve, YOLOR can be adjusted to meet new demands, ensuring continuous improvement in traffic management.

In the context of neural networks, for a conventional network, the objective function can be formulated as follows:

y = f_{θ} (x) + ϵ,

(1)

where x is the observation,

θ

represents the set of parameters of a neural network,

f_{θ}

denotes the operation of the neural network,

ϵ

is the error term, and y is the target of a given task. The goal is to minimize

ϵ

to make

f_{θ} (x)

as close to the target as possible. The YOLOR proposes an enhanced formulation integrating explicit and implicit knowledge as is denoted in (2):

y = f_{θ} (x) + ϵ + g_{ϕ} (ϵ_{e x} (x), ϵ_{i m} (z)),

(2)

where

ϵ_{e x}

and

ϵ_{i m}

model the explicit and implicit errors from observation x and latent code z, respectively.

g_{ϕ}

is a task-specific operation that combines information from both explicit and implicit knowledge [37]. In this study, the YOLOR algorithm is designed to handle both explicit and implicit errors, which are crucial for improving the accuracy and efficiency of vehicle detection in traffic monitoring systems. Explicit errors arise from the observable discrepancies in the data, such as misclassified vehicles or incorrect bounding boxes. In contrast, implicit errors emanate from non-observed factors, including model assumptions and latent variables. By separating these errors, the algorithm can better capture the complex interactions within the traffic data, leading to more accurate predictions and robust performance in real-world scenarios. This separation enhances the algorithm’s ability to adapt to varying traffic conditions, improving its overall efficacy in detecting and classifying vehicles. The biggest challenges in applying AI algorithms like YOLOR and Deep Sort for vehicle detection and tracking include dealing with occlusions, lighting variations, and complex dynamics in urban traffic. YOLOR effectively detects vehicles but can struggle with occlusions and overlapping objects, leading to missed detections. Deep Sort, on the other hand, can lose track of vehicles during abrupt changes in direction or speed, reducing tracking accuracy. These challenges highlight areas where further improvements are needed to enhance the robustness of these algorithms in real-world scenarios.

3.2. Deep Sort Algorithm

The Deep Sort algorithm is an extension of the original SORT (Simple Online and Real-time Tracking) algorithm [38], which significantly improves tracking accuracy by incorporating DL features. Deep Sort combines motion and appearance information to track objects across frames in a video sequence. Deep Sort operates by following the subsequent steps:

Detection: In each frame, objects are detected, and their bounding boxes are output by an object detection model like YOLO or SSD.
Feature extraction: A CNN extracts features from each detected object to assist in distinguishing between different objects.
Prediction: For each track, the Kalman filter predicts the new state based on its previous state.
Association: The predicted states are matched with new detections based on a cost matrix that considers the predicted Kalman states and appearance features. The matching is optimized using the Hungarian algorithm.
Update: The Kalman filter updates the state of each matched track with the corresponding detection.
Track Management: Tracks are created for unmatched detections and are terminated if they remain unmatched for too long.

The essence of the Deep Sort algorithm is implanted in its state estimation and data association techniques, which are enabled by the Kalman filter and the Hungarian algorithm, respectively. These two components play a pivotal role in the algorithm’s operation.

The Kalman filter predicts and updates the state of each track with the following equations:

Prediction : {\hat{x}}_{k | k - 1} = F_{k} {\hat{x}}_{k - 1 | k - 1} + B_{k} u_{k},

(3)

where

{\hat{x}}_{k | k - 1}

is the predicted state,

F_{k}

is the state transition model,

B_{k}

is the control input model, and

u_{k}

is the control vector:

Update : {\hat{x}}_{k | k} = {\hat{x}}_{k | k - 1} + K_{k} (y_{k} - H_{k} {\hat{x}}_{k | k - 1}),

(4)

where

{\hat{x}}_{k | k}

is the updated state,

K_{k}

is the Kalman gain,

y_{k}

is the measurement, and

H_{k}

is the measurement model.

The cost matrix for matching predicted tracks to new detections is calculated as follows:

C_{i j} = (1 - λ) \cdot Mahalanobis (i, j) + λ \cdot Cos ineDistance (i, j),

(5)

where

λ

is a tuning parameter that balances the influence of distance metrics, Mahalanobis (

i, j

) calculates the Mahalanobis distance between the predicted state and the detection, and CosineDistance (

i, j

) measures the cosine distance between their appearance features.

The Hungarian algorithm is utilized to find the optimal assignment that minimizes the overall cost, defined by the cost matrix C. This algorithm ensures that each detection is uniquely matched to a track based on spatial and appearance data, facilitating robust object tracking [39].

This algorithm enhances the tracking performance by effectively integrating appearance features extracted via a deep neural network with motion predictions made by the Kalman filter. In contrast, the Hungarian algorithm optimizes the tracking associations across frames.

The adapting capacity of the YOLOR and Deep Sort algorithms for Morelia’s traffic conditions was tested across this research, considering the unique characteristics and challenges of the city, for instance, the types of vehicles commonly found on its roads, typical traffic patterns, and specific congestion points.

3.3. Data Collection

The researchers identified six critical points in Morelia in the data collection process: “Calzada La Huerta”, “Camelinas Avenue”, “Calzada La Huerta-East”, “Francisco I. Madero West”, “Federal Hwy 14”, and “Calzada La Huerta-Cosmos Avenue”. These locations were selected based on various engineering considerations, particularly the significant traffic volume observed during peak hours. These roads are major arteries for accessing different parts of the city, with a high vehicular traffic density, making them ideal candidates for a comprehensive traffic management and vehicle counting study.

The selected critical points are denominated as denoted in Table 1.

The selection criteria included the volume of traffic and the diversity of vehicle types, traffic patterns, and the potential for congestion. These factors are crucial for developing a robust traffic monitoring system that can provide accurate and reliable data for traffic management purposes.

The geographical coordinates of the monitoring station points are detailed in Table 2. Furthermore, Figure 3 provides a comprehensive visual context with screenshots taken from the Google Maps application, highlighting the specific areas of interest within the city. These figures not only illustrate the layout and surrounding infrastructure of the monitoring stations but also provide a reassuring level of detail for your planning and decision-making processes.

The authors installed high-definition cellphone cameras at strategic positions for the monitoring stations. These cameras, specifically, the Google Pixel smartphone camera, were equipped with a 12.2 MP 1/2.55” sensor, 1.4 µm pixels, 77° field of view, f/1.7 aperture lens, Dual Pixel Phase Detection Autofocus (PDAF), and Optical Image Stabilization (OIS). This setup allowed us to capture video data in 1080p resolution at 30 frames per second (FPS), ensuring a comprehensive dataset that covers various traffic and lighting conditions, including peak and off-peak hours, weekdays, and weekends. The detailed analysis of this data provides valuable insights into traffic patterns and the identification of trends and anomalies.

The video footage from all monitoring station points was processed using advanced DL algorithms, with a specific focus on the YOLOR model. This model was instrumental in detecting, classifying, and counting vehicles in real-time, thereby providing crucial insights into traffic management. The data obtained from the YOLOR model were compared to the ground truth information, which was acquired through a manual counting process by the authors.

To leverage the strategic importance of “Calzada La Huerta” and “Camelinas Avenue”, this study was divided into two parts. The first one involved a complete analysis of the model’s performance, which included variations in the confidence level and variation in the model’s version (different model depth) to know what combination of the model fits better to the vehicle-counting task. The second part consisted of analyzing the rest of the monitoring stations (MS3, MS4, MS5, and MS6) in inference mode to corroborate the first stage of the methodology and assess the capabilities of the YOLOR algorithm for counting vehicles in real scenarios considering a permanent vehicular flow.

At MS1, the camera was positioned at street level to evaluate the model’s performance in detecting and counting vehicles from a lateral perspective. In contrast, at MS2, the camera was placed 6 m above the road on a pedestrian bridge, providing an elevated and frontal view of the vehicular traffic. In both scenarios, a commercial tripod was utilized to ensure video stability.

The traffic density at both monitoring stations was high, with well-defined zones of changing traffic conditions. The typical composition of vehicles included passenger cars, trucks, motorcycles, buses, trailers, and bicycles. Notably, there were no significant variations in traffic patterns during the data collection period.

3.4. Inference Methodology

Each country generally has its vehicular classification system; in Mexico, vehicles are classified based on the equivalent single axle load (ESAL). This approach results in a detailed and comprehensive categorization of vehicles. However, for this research, a simplified classification was adopted, focusing on five main types of vehicles: cars, trucks, buses, motorcycles, and bicycles. This general classification aligns with the categories found in the COCO dataset, which contains 80 different classes of objects, including the aforementioned vehicle types [40].

A critical aspect of this study is evaluating the YOLOR algorithm’s performance in inference mode and its combination with the Deep Sort algorithm. Thus, the study utilized transfer learning and fine-tuning techniques, leveraging pre-trained weights. The YOLOR algorithm was trained on various datasets to assess its effectiveness, and the COCO dataset was among those used in the training process. As a result, five sets of pre-trained weights are available for this customized analysis (YOLOR P6, YOLOR CSP, YOLOR CSP STAR, YOLOR CSP X STAR, and YOLOR CSP X), each corresponding to different versions of the model. Table 3 shows distinctions between the model’s version that were analyzed as the first instance across this research, depicting their performance on GPU, CPU, average precision (AP^val), and

{AP}_{50}^{val}

obtained in the COCO dataset.

This study explored three pre-trained models, each differing in size and complexity. The models selected were YOLOR P6, YOLOR CSP, and YOLOR CSP X, representing the algorithm’s small, large, and extra-large versions, respectively. YOLOR P6 is the most compact model, designed for efficiency with a reduced computational load. YOLOR CSP is a larger version that balances complexity and performance, offering improved detection accuracy. YOLOR CSP X, the largest model, provides the highest precision and robustness in vehicle detection and classification. However, the latter requires sophisticated hardware and software to achieve a paramount performance, which in some cases is inefficient. The pre-trained weights associated with these models are crucial for customizing the YOLOR algorithm to the specific requirements of this research.

Another critical aspect to consider in this research is how the vehicles are counted employing the YOLOR and Deep Sort algorithms. To carry out the counting process, a virtual line is overlapped on each video footage for each monitoring station. The virtual line, also called the virtual counter, is set for each monitoring station, establishing two coordinates,

(x_{1}, y_{1})

and

(x_{2}, y_{2})

, coordinates that indicate the location where the virtual counter will be placed. This virtual counter simulates the pneumatic counter employed in the traditional methods; however, in this case, when a tracked vehicle (which previously has been detected and classified by the YOLOR algorithm) crosses the virtual counter, the class of the vehicle is counted by the virtual counter. In this manner, the number of vehicles is stored in the register. In this part, the Deep Sort algorithm enters the scene since the algorithm tracks the detected object. In reality, the virtual counter registers the tracked element, thus providing an accurate way of counting the number of vehicles.

3.5. Computational Details

All the experiments and tests were performed in a personal workstation with the following features:

A processor of 13th Gen Intel(R) Core(TM) i7-13620H 2.40 GHz, 48 GB of Random Access Memory.
A NVIDIA GeForce RTX 4060 Laptop GPU, CUDA cores: 3072, Max-Q Technology, 8,188 MB GDDR6.
A GPU-accelerated Python environment was created following the next: CUDNN 8.2.1, CUDAToolkit 11.3.1, Keras 2.4.3, Keras-GPU 2.4.3, Tensorflow-GPU 2.5.0, Tensorflow 2.5.0, and Python 3.7.16.

4. Results and Discussions

4.1. Model’s Performance in Its Different Variants

The main objective of this research is to test the performance of the combination of YOLOR and Deep Sort algorithms for developing traffic management tasks, such as vehicle counting, in real-world scenarios. The first stage was to evaluate all the possible features of the YOLOR model. As mentioned earlier, MS1 and MS2 were evaluated using various confidence levels and model versions, ensuring a thorough analysis of the algorithms’ performance under different conditions.

The parameters selection for performing this task is shown in Table 4 and Table 5. These tables detail the specific approaches for MS1 and MS2, including the parameter values and their configurations applied during the analysis. Additionally, computational details (computing time) have been included to offer a more comprehensive understanding of the model’s performance.

Table 4 and Table 5 show how computational time decreases as the confidence level increases across all model versions. This reduction in computing time can be attributed to the algorithm’s reduced need to distinguish between object classes when the confidence level is high. The algorithm makes more definitive decisions with a higher confidence threshold, reducing the computational load. Similarly, the average frames per second (FPS) also shows improvement with increased confidence levels. This improvement in the FPS is closely linked to the reduction in computing time; as the model performs fewer computations per frame, the processing speed for each frame increases. Consequently, higher confidence levels lead to a more efficient processing rate, reflected in the increased FPS.

A notable distinction among the YOLOR model versions is their size and corresponding impact on performance. The YOLOR P6 is the smallest and lightest version of the YOLOR architectures, resulting in the highest average FPS due to its reduced computational requirements. In contrast, the YOLOR CSP X is the largest and most complex version, which, while offering greater accuracy, incurs a higher computational cost and, thus, lower FPS. The YOLOR CSP represents a middle ground, balancing computational load and processing speed.

Figure 4 and Figure 5 provide the number of vehicles detected in each tested confidence level for each model version. Vehicle classes are abbreviated as follows: Cars (C), Trucks (T), Buses (B), Motorcycles (M), and Bicycles (Bi). These figures illustrate the variability in vehicle counts across different model versions and confidence levels. Despite this variability, the results fall within a 2% tolerance range as indicated by the standard deviation.

Given the differing scales of each vehicle class, the values were normalized to a range between 0 and 1 for consistency. The standard deviation (STD) for each class under each inference scenario was calculated to assess the consistency of the detections. The results for the MS1 are the following:

STD C = 0.36925
STD T = 0.36299
STD B = 0.29011
STD M = 0.32064
STD Bi = 0.31207,

while the results for the MS2 are the next:

STD C = 0.37424
STD T = 0.34863
STD B = 0.28571
STD M = 0.40356
STD Bi = 0.32578.

In this study, the “Car” vehicle type is the predominant category within the vehicular distribution, reflecting the typical urban traffic composition. This is a familiar situation where passenger cars comprise the majority of the traffic flow. However, it is crucial to underscore the significance of the“Trucks” category, as trucks substantially impact the roadway infrastructure. Trucks exert much greater stress on road surfaces compared to regular passenger cars due to their heavier weight and larger size, which leads to increased deformations. A similar distribution pattern is noticed at MS2, where the “Car” category remains the prevalent class. The “Truck” category is the second-most frequent, reinforcing the importance of considering the influence of trucks on traffic management and infrastructure maintenance.

4.2. Comparison Models

Once the model’s performance for each stage was computed, a statistical analysis was performed to recognize the more stable performance across all the tests. The distribution of the number of vehicles is visualized using boxplots, which categorize the variety of vehicles by confidence level across the classes: Cars (C), Trucks (T), Buses (B), and Motorcycles (M). These boxplots are depicted in Figure 6 and Figure 7, illustrating the inferences made by the YOLOR P6, YOLOR CSP, and YOLOR CSP X models. Each confidence level contains the distribution for the three mentioned models in these boxplots.

The previous information provides a sense of the density of the information, which was computed by YOLOR P6, YOLOR CSP, and YOLOR CSP X models. This information showcases no significant variations in the amount outputted by the models, demonstrating consistency in their performance and the absence of outlier data. In addition, this analysis was performed using the models as reference points, where the results are illustrated in Figure 8 and Figure 9, and for this case, each model’s version contains the distribution for the three confidence levels, 0.35, 0.55, and 0.75.

Table 6 and Table 7 present the ground truth data for the monitoring stations MS1 and MS2. These tables contain the actual counts of each vehicle class as recorded in the video footage, serving as a reference for evaluating the performance of the YOLOR and Deep Sort algorithms. The number of vehicles detected was compared with the ground truth at varying confidence levels across different model versions to assess accuracy.

For this analysis, the average accuracy was calculated by counting the number of vehicles detected in each inference mode (defined by the confidence level and model version) and comparing it against the ground truth. The computed average accuracy results are displayed in Table 8 and Table 9 for MS1 and MS2. These comparisons allow for a detailed understanding of how well the models performed in various configurations and conditions, highlighting the strengths and limitations of each approach in accurately identifying and counting vehicles.

By evaluating the data in this manner, the study provides a comprehensive overview of the efficacy of different YOLOR model versions and confidence levels in real-world traffic scenarios. This detailed analysis is crucial for determining the reliability and precision of these models, offering valuable insights into their potential applications in traffic management systems. Table 8 and Table 9 detail the overall performance metrics for each model version across all analyzed vehicle classes. The data indicate that the model generally performed well in identifying and counting vehicles in classes C, T, and M, with the highest accuracy observed in these categories. However, the detection and classification of classes B and Bi are less consistent, leading to a slight decrease in the overall model performance. This discrepancy suggests that improvements are necessary in detecting these vehicle types to enhance the model’s overall accuracy.

At MS1, the best performance was achieved with the YOLOR CSP model configured at a confidence level of 0.35. Similarly, for MS2, the same model version and confidence level yielded superior results. This consistency indicates that the YOLOR CSP model at a 0.35 confidence threshold is optimal for the traffic conditions observed in this study.

Despite the generally robust performance of the YOLOR algorithm, there is a noticeable trend where lateral perspectives more accurately detected and classified vehicles in classes B, M, and Bi. In contrast, frontal and overhead views did not perform as well in these categories. This observation suggests that the camera angle and perspective might influence the model’s detection capabilities, highlighting a potential area for further refinement.

In order to give an overview of the model’s performance in MS1 and MS2, Figure 10 and Figure 11 show a screenshot by each monitoring station, depicting how the counting process looks.

For an extensive view of the model’s performance, the following demos represent a fragment of each video footage, MS1 and MS2, respectively.

https://www.youtube.com/watch?v=VCo_rCulYsI, accessed on 28 August 2024
https://www.youtube.com/watch?v=_uZjKpMID7Y, accessed on 28 August 2024

4.3. Inferences from Monitoring Stations

Once all models were analyzed and tested with different confidence levels, and the best model and confidence level were selected, some changes were made in the analyzed inferences. First, the classes of interest were changed to C, T, and B because these classes are the most well addressed by the algorithms and at the same time, in a vehicle capacity, these types of the vehicles are the most important because these are determinant in the exerted loads to the pavement surfaces. Second, the virtual counter was divided into two counters, one that is responsible for counting the vehicles in one direction and the second that counts the vehicles in the opposite direction. Screenshots of all monitoring stations are illustrated in Figure 12.

The demos of the remaining monitoring stations can be found in:

MS3: https://youtu.be/ISWTuto5ocA, accessed on 28 August 2024
MS4: https://youtu.be/ZiFw-rH_m1s, accessed on 28 August 2024
MS5: https://youtu.be/KtnIiUgmGGk, accessed on 28 August 2024
MS6: https://youtu.be/pu0Ps6QtDVA, accessed on 28 August 2024

With these findings and changes applied to the monitoring stations, the results are shown in Table 10, where the results of the last column are the results of the total of vehicles detected by the AI approach and the number of vehicles registered as the ground truth.

From the demos and the results shown in Table 10, it is possible to notice that the combination of the analyzed algorithms can solve the problem of the counting vehicles task with heavier accuracy. It is important to note that the higher accuracy observed in Table 10 at MS1 and MS2 is due to the focus on well-detected vehicle classes—C, T, and B—while other classes that presented challenges in classification were excluded, contributing to the improved performance metrics.

However, the combination of these algorithms can adopt various improvements to achieve the best performance in this type of analysis. Initially, the pre-trained weights used in this research correspond to a dataset with a significant general classification of the vehicles. In that sense, the vehicular classification for Mexico is different. This scenario suggests corrections in the type of vehicles that are implicit in the pre-trained weights, and it will be possible with a customized dataset. For instance, the models tend to confuse public transportation (which, in the case of México, is called “combis”) by C or T. This confusion produces a different number of detected elements during the inference. Also, a variety of T used in México are classified and detected as C because this type of vehicle was not defined previously during the training process.

Another critical finding is that the models perform better when the camera angle allows them to observe the shape of the vehicles, enabling better vehicular classification. Considering this observation, the angle of the cameras can be modified, or some frontal vehicle images will need to be trained to recognize the type of vehicle more accurately across the inference. Additionally, weather conditions and light intensity significantly impact the recognition results. Unfavorable weather and low-light scenarios increase the likelihood of false positives and negatives, particularly at night or in severe weather conditions.

In some cases, the algorithm needs to figure out the class of a detected vehicle, causing the output to show two classes simultaneously, duplicating the register of the vehicle, one by each class. Regarding the registered number of vehicles, the AI approach showcases an excellent accuracy of about 98% on average, which is relevant for data traffic management.

Comparing traditional vehicle counting methods, such as visual observation and pneumatic sensors, with the AI-based approach reveals clear advantages in terms of efficiency, speed, and real-time adaptability. The AI approach, exemplified by the YOLOR model, demonstrates significant potential for improving traffic management. However, it is essential to continue enhancing these computational methods. This emphasis on continuous improvement underscores the importance of the audience’s role in further distinguishing AI-based approaches from traditional techniques and fully leveraging their capabilities.

5. Conclusions

This research showcases that implementing the YOLOR algorithm combined with Deep Sort demonstrates a significant advancement in traffic monitoring systems. The integration of these technologies allows for accurate vehicle detection and classification, offering a reliable solution for real-time traffic management. This study highlights the importance of selecting appropriate model versions and confidence levels to optimize detection accuracy and processing speed, enabling traffic authorities to enhance road safety and manage resources effectively.

Furthermore, the findings underscore the need for continued research and development to refine these AI-based approaches. Future work can further improve the efficiency and reliability of these systems by addressing challenges such as vehicle misclassification and the impact of camera angles. The application of AI algorithms in Morelia sets a precedent for its potential adoption in other urban areas, contributing to the broader effort of developing intelligent traffic management solutions.

Regarding the computational efficiency, the authors corroborated that YOLOR P6 exhibits the fastest processing times due to its lightweight architecture, making it ideal for real-time applications where speed is critical. YOLOR CSP balances between computational efficiency and accuracy, suitable for scenarios requiring moderate speed and precision.

Another critical finding is the behavior of the confidence level since as the confidence level increases, the accuracy of vehicle detection improves, reducing false positives and negatives. This relationship underscores the importance of setting an optimal confidence threshold to balance detection accuracy and processing speed.

The variability in vehicle counts across different model versions and confidence levels is minimal, with standard deviations indicating a high level of consistency. This consistency ensures reliable vehicle detection and classification across varying factors.MDPI

The findings from this research highlight the potential of YOLOR combined with Deep Sort for efficient and accurate traffic monitoring. By selecting appropriate model versions and confidence levels, traffic authorities can optimize resource allocation, enhance traffic flow, and improve road safety. This study not only demonstrates the novel application of the YOLOR and Deep Sort algorithms for real-time vehicle detection and classification but also establishes their viability under controlled conditions in vehicle-counting tasks, providing a reliable approach for enhancing traffic management systems.

Author Contributions

Conceptualization, J.A.G.-T.; methodology, J.A.G.-T.; software, J.A.G.-T. and G.T.-G.; validation, J.A.G.-T., F.J.D.-M., and G.T.-G.; formal analysis, J.A.G.-T., M.C.G.-C., and G.T.-G.; investigation, J.A.G.-T. and F.J.D.-M.; resources, J.A.G.-T., J.G.T.-R., and F.J.D.-M.; data curation, J.A.G.-T. and M.C.G.-C.; writing—original draft preparation, J.A.G.-T.; writing—review and editing, J.A.G.-T., F.J.D.-M., G.T.-G., and J.G.T.-R.; visualization, J.A.G.-T. and G.T.-G.; supervision, J.A.G.-T. and F.J.D.-M.; project administration, J.A.G.-T.; funding acquisition, J.A.G.-T. and F.J.D.-M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received support from the “MorelIA: Transformación Inteligente del Análisis y Conteo Vehicular para un Michoacán Innovador” project.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The authors do not have data to share.

Acknowledgments

The authors thank AULA CIMNE-Morelia, CIC UMSNH, and CONAHCYT for supporting this research.

Conflicts of Interest

The authors declare no conflict of interest.

References

Lilhore, U.K.; Imoize, A.L.; Li, C.T.; Simaiya, S.; Pani, S.K.; Goyal, N.; Kumar, A.; Lee, C.C. Design and implementation of an ML and IoT based adaptive traffic-management system for smart cities. Sensors 2022, 22, 2908. [Google Scholar] [CrossRef] [PubMed]
Ali, F.; Ali, A.; Imran, M.; Naqvi, R.A.; Siddiqi, M.H.; Kwak, K.S. Traffic accident detection and condition analysis based on social networking data. Accid. Anal. Prev. 2021, 151, 105973. [Google Scholar] [CrossRef]
Shrestha, R.; Oh, I.; Kim, S. A survey on operation concept, advancements, and challenging issues of urban air traffic management. Front. Future Transp. 2021, 2, 626935. [Google Scholar] [CrossRef]
Bhatia, J.; Dave, R.; Bhayani, H.; Tanwar, S.; Nayyar, A. SDN-based real-time urban traffic analysis in VANET environment. Comput. Commun. 2020, 149, 162–175. [Google Scholar] [CrossRef]
Modi, Y.; Teli, R.; Mehta, A.; Shah, K.; Shah, M. A comprehensive review on intelligent traffic management using machine learning algorithms. Innov. Infrastruct. Solut. 2022, 7, 128. [Google Scholar] [CrossRef]
Tippannavar, S.; SD, Y. Real-time vehicle identi-fication for improving the traffic management system-a review. J. Trends Comput. Sci. Smart Technol. 2023, 5, 323–342. [Google Scholar] [CrossRef]
Sánchez, J.T.; Del Río, J.A.; Sánchez, A. Economic feasibility analysis for an electric public transportation system: Two cases of study in medium sized cities in Mexico. PLoS ONE 2022, 17, e0272363. [Google Scholar] [CrossRef]
Monkkonen, P.; Canez, J.; Echavarria, A. Urban Planning in Mexico: The Cases of Hermosillo, Leon, Morelia, and Campeche; UCLA Ciudades: Los Angeles, CA, USA, 2020. [Google Scholar]
Alveano-Aguerrebere, I.; Javier Ayvar-Campos, F.; Farvid, M.; Lusk, A. Bicycle facilities that address safety, crime, and economic development: Perceptions from Morelia, Mexico. Int. J. Environ. Res. Public Health 2018, 15, 1. [Google Scholar] [CrossRef]
Azimjonov, J.; Özmen, A. A real-time vehicle detection and a novel vehicle tracking systems for estimating and monitoring traffic flow on highways. Adv. Eng. Inform. 2021, 50, 101393. [Google Scholar] [CrossRef]
Salazar-Carrillo, J.; Torres-Ruiz, M.; Davis Jr, C.A.; Quintero, R.; Moreno-Ibarra, M.; Guzmán, G. Traffic congestion analysis based on a web-gis and data mining of traffic events from twitter. Sensors 2021, 21, 2964. [Google Scholar] [CrossRef]
Rani, N.G.; Priya, N.H.; Ahilan, A.; Muthukumaran, N. LV-YOLO: Logistic vehicle speed detection and counting using deep learning based YOLO network. Signal Image Video Process. 2024, 18, 7419–7429. [Google Scholar] [CrossRef]
Wu, X.; Liu, C.; Wang, L.; Bilal, M. Internet of things-enabled real-time health monitoring system using deep learning. Neural Comput. Appl. 2023, 35, 14565–14576. [Google Scholar] [CrossRef] [PubMed]
Nadipour, F.; Sedaghat, S.; Amiri, E.; Rastad, M.S. A deep-learning-based SIoV framework in vehicle detection and counting system for Intelligent traffic management. In Proceedings of the IEEE 2024 8th International Conference on Smart Cities, Internet of Things and Applications (SCIoT), Shenzhen, China, 14–16 November 2024; pp. 49–54. [Google Scholar]
Guzmán-Torres, J.A.; Domínguez-Mota, F.J.; Tinoco-Guerrero, G.; Tinoco-Ruíz, J.G.; Alonso-Guzmán, E.M. Extreme fine-tuning and explainable AI model for non-destructive prediction of concrete compressive strength, the case of ConcreteXAI dataset. Adv. Eng. Softw. 2024, 192, 103630. [Google Scholar] [CrossRef]
Guzmán-Torres, J.A.; Naser, M.; Domínguez-Mota, F.J. Effective medium crack classification on laboratory concrete specimens via competitive machine learning. In Structures; Elsevier: Amsterdam, The Netherlands, 2022; Volume 37, pp. 858–870. [Google Scholar]
Guzmán-Torres, J.A.; Morales-Rosales, L.A.; Algredo-Badillo, I.; Tinoco-Guerrero, G.; Lobato-Báez, M.; Melchor-Barriga, J.O. Deep learning techniques for multi-class classification of asphalt damage based on hamburg-wheel tracking test results. Case Stud. Constr. Mater. 2023, 19, e02378. [Google Scholar] [CrossRef]
Guzmán-Torres, J.; Domínguez-Mota, F.; Tinoco-Guerrero, G.; Román-Gutierrez, R.; Arias-Rojas, H.; Naser, M. Explainable computational intelligence method to evaluate the damage on concrete surfaces compared to traditional visual inspection techniques. In Interpretable Machine Learning for the Analysis Design Assessment and Informed Decision Making for Civil Infrastructure; Elsevier: Amsterdam, The Netherlands, 2024; pp. 77–109. [Google Scholar]
Vadhadiya, P.; Umar, S.A.; Reshma, S.; Akshitha, S.; Sravanthi, B.; Karthik, K. Vehicle Detection And Counting System Using OpenCV. In Proceedings of the IEEE 2024 10th International Conference on Communication and Signal Processing (ICCSP), Singapore, 12–14 April 2024; pp. 693–696. [Google Scholar]
Lin, C.J.; Jeng, S.Y.; Lioa, H.W. A Real-Time Vehicle Counting, Speed Estimation, and Classification System Based on Virtual Detection Zone and YOLO. Math. Probl. Eng. 2021, 2021, 1577614. [Google Scholar] [CrossRef]
Farid, A.; Hussain, F.; Khan, K.; Shahzad, M.; Khan, U.; Mahmood, Z. A fast and accurate real-time vehicle detection method using deep learning for unconstrained environments. Appl. Sci. 2023, 13, 3059. [Google Scholar] [CrossRef]
Payghode, V.; Goyal, A.; Bhan, A.; Iyer, S.S.; Dubey, A.K. Object detection and activity recognition in video surveillance using neural networks. Int. J. Web Inf. Syst. 2023. ahead-of-print. [Google Scholar]
Zuraimi, M.A.B.; Zaman, F.H.K. Vehicle detection and tracking using YOLO and DeepSORT. In Proceedings of the 2021 IEEE 11th IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE), Penang, Malaysia, 3–4 April 2021; pp. 23–29. [Google Scholar]
Al-qaness, M.A.; Abbasi, A.A.; Fan, H.; Ibrahim, R.A.; Alsamhi, S.H.; Hawbani, A. An improved YOLO-based road traffic monitoring system. Computing 2021, 103, 211–230. [Google Scholar] [CrossRef]
Algiriyage, N.; Prasanna, R.; Stock, K.; Hudson-Doyle, E.; Johnston, D.; Punchihewa, M.; Jayawardhana, S. Towards Real-time Traffic Flow Estimation using YOLO and SORT from Surveillance Video Footage. In Proceedings of the ISCRAM, Melbourne, Australia, 8–10 November 2021; pp. 40–48. [Google Scholar]
Hasibuan, N.N.; Zarlis, M.; Efendi, S. Detection and tracking different type of cars with YOLO model combination and deep sort algorithm based on computer vision of traffic controlling. Sink. J. Dan Penelit. Tek. Inform. 2021, 5, 210–221. [Google Scholar]
Lin, C.J.; Jhang, J.Y. Intelligent traffic-monitoring system based on YOLO and convolutional fuzzy neural networks. IEEE Access 2022, 10, 14120–14133. [Google Scholar] [CrossRef]
Abbasi, M.; Shahraki, A.; Taherkordi, A. Deep learning for network traffic monitoring and analysis (NTMA): A survey. Comput. Commun. 2021, 170, 19–41. [Google Scholar] [CrossRef]
Zhu, J.; Li, X.; Jin, P.; Xu, Q.; Sun, Z.; Song, X. Mme-yolo: Multi-sensor multi-level enhanced yolo for robust vehicle detection in traffic surveillance. Sensors 2020, 21, 27. [Google Scholar] [CrossRef]
Chavhan, R.D.; Sambare, G. AI-Driven Traffic Management Systems In Smart Cities: A Review. Educ. Adm. Theory Pract. 2024, 30, 105–116. [Google Scholar]
Kang, L.; Lu, Z.; Meng, L.; Gao, Z. YOLO-FA: Type-1 fuzzy attention based YOLO detector for vehicle detection. Expert Syst. Appl. 2024, 237, 121209. [Google Scholar] [CrossRef]
Torres, J.A.G.; Mota, F.J.D.; Guerrero, G.T.; Ruíz, J.G.T. Leveraging Deep Learning for Enhanced Traffic Counting and Efficiency in Morelia México: An Artificial Intelligence Approach. In Proceedings of the International Conference on Recent Advances in Transportation (ICRAT 2024), Singapore, 1–4 July 2024; Universidad del Mar: Puerto Escondido, México, 2024; pp. 1–8. [Google Scholar]
Sun, H.; Lu, D.; Li, X.; Tan, J.; Zhao, J.; Hou, D. Research on multi-apparent defects detection of concrete bridges based on YOLOR. In Structures; Elsevier: Amsterdam, The Netherlands, 2024; Volume 65, p. 106735. [Google Scholar]
Ferrante, G.S.; Vasconcelos Nakamura, L.H.; Sampaio, S.; Filho, G.P.R.; Meneguette, R.I. Evaluating YOLO architectures for detecting road killed endangered Brazilian animals. Sci. Rep. 2024, 14, 1353. [Google Scholar] [CrossRef] [PubMed]
Huang, Y.F.; Liu, T.J.; Lin, C.A.; Liu, K.H. SOAda-YOLOR: Small Object Adaptive YOLOR Algorithm for Road Object Detection. In Proceedings of the IEEE 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Taipei, Taiwan, 31 October–3 November 2023; pp. 1652–1658. [Google Scholar]
Bakirci, M. Enhancing Vehicle Detection in Intelligent Transportation Systems via Autonomous UAV Platform and YOLOv8 Integration. Appl. Soft Comput. 2024, 164, 112015. [Google Scholar]
Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. You only learn one representation: Unified network for multiple tasks. arXiv 2021, arXiv:2105.04206. [Google Scholar]
Bewley, A.; Ge, Z.; Ott, L.; Ramos, F.; Upcroft, B. Simple online and realtime tracking. In Proceedings of the 2016 IEEE international conference on image processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3464–3468. [Google Scholar]
Wojke, N.; Bewley, A.; Paulus, D. Simple online and realtime tracking with a deep association metric. In Proceedings of the 2017 IEEE international conference on image processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]

Figure 1. Traditional pneumatic counter of vehicles used in México.

Figure 2. Traditional pneumatic counter of vehicles registering different types of cars.

Figure 3. Screenshots of the location of the monitoring station points in Morelia city, México.

Figure 4. Distribution of the number of vehicles at MS1.

Figure 5. Distribution of the number of vehicles at MS2.

Figure 6. Boxplot representing the number of vehicles counted at the MS1 considering the level of confidence as a class.

Figure 7. Boxplot representing the number of vehicles counted at the MS2, considering the level of confidence as a class.

Figure 8. Boxplot representing the number of vehicles counted at the MS1, considering the model as a class.

Figure 9. Boxplot representing the number of vehicles counted at the MS2, considering the model as a class.

Figure 10. Performance of the virtual counting in the MS1.

Figure 11. Performance of the virtual counting in the MS2.

Figure 12. Performance of the virtual counting in MS2, MS3, MS4, MS5 and MS6.

Table 1. Name of the monitoring stations and their acronyms used in the manuscript.

Monitoring Station Name	Acronym	Light Conditions
Calzada La Huerta	MS1	Sunny
Camelinas Avenue	MS2	Cloudy
Calzada La Huerta-East	MS3	No sunlight
Francisco I. Madero West	MS4	Cloudy
Federal Hwy 14	MS5	Cloudy
Calzada La Huerta-Cosmos Avenue	MS6	Sunny

Table 2. Latitude and longitude locations for the analyzed monitoring stations.

Monitoring Station	Latitude	Longitude
MS1	19.681368	−101.217281
MS2	19.683006	−101.216509
MS3	19.681368	−101.217281
MS4	19.701529	−101.237972
MS5	19.674628	−101.220469
MS6	19.683016	−101.216507

Table 3. YOLOR performance on its different versions [37].

Model Name	FPS-GPU	FPS-CPU	Framework	AP^val	${AP}_{50}^{val}$
YOLOR CSP X	28.6	1.83	Pytorch	51.1%	69.6%
YOLOR CSP X STAR	30	1.76	Pytorch	51.5%	69.9%
YOLOR CSP STAR	38.1	2.86	Pytorch	50%	68.7%
YOLOR CSP	38	2.77	Pytorch	49.2%	67.67%
YOLOR P6	20	1.57	Pytorch	52.6%	70.6%

Table 4. Analysis details of the MS1.

Model Versions	Confidence Level	FPS Average	Computing Time
YOLOR CSP X	0.35	10.65850	10,746.900 s
YOLOR CSP X	0.55	10.97416	9972.058 s
YOLOR CSP X	0.75	11.89185	9071.210 s
YOLOR CSP	0.35	16.23829	10,449.018 s
YOLOR CSP	0.55	15.83025	7325.711 s
YOLOR CSP	0.75	16.72830	6417.746 s
YOLOR P6	0.35	14.93275	10,236.872 s
YOLOR P6	0.55	15.35486	9232.314 s
YOLOR P6	0.75	15.74308	7237.114 s

Table 5. Analysis details of the MS2.

Model Versions	Confidence Level	FPS Average	Computing Time
YOLOR CSP X	0.35	11.08947	10,239.895 s
YOLOR CSP X	0.55	11.22915	10,602.192 s
YOLOR CSP X	0.75	12.20530	8821.168 s
YOLOR CSP	0.35	14.49380	7714.414 s
YOLOR CSP	0.55	15.68703	6878.323 s
YOLOR CSP	0.75	16.99784	5788.091 s
YOLOR P6	0.35	15.25878	7720.355 s
YOLOR P6	0.55	14.28251	7228.845 s
YOLOR P6	0.75	17.41921	5285.749 s

Table 6. Ground truth values for the MS1.

	C	T	B	M	Bi
Number of vehicles	1217	260	29	117	7

Table 7. Ground truth values for the MS2.

	C	T	B	M	Bi
Number of vehicles	1846	165	1	129	0

Table 8. Average accuracy computed by each model’s version at the MS1.

Model Version	Confidence Level	Average Accuracy
YOLOR CSP X	0.35	79.86
YOLOR CSP X	0.55	74.96
YOLOR CSP X	0.75	66.32
YOLOR CSP	0.35	83.81
YOLOR CSP	0.55	76.77
YOLOR CSP	0.75	54.69
YOLOR P6	0.35	75.00
YOLOR P6	0.55	68.21
YOLOR P6	0.75	65.36

Table 9. Average accuracy computed by each model’s version at the MS2.

Model Version	Confidence Level	Average Accuracy
YOLOR CSP X	0.35	66.31
YOLOR CSP X	0.55	75.62
YOLOR CSP X	0.75	52.89
YOLOR CSP	0.35	78.91
YOLOR CSP	0.55	70.17
YOLOR CSP	0.75	47.63
YOLOR P6	0.35	56.72
YOLOR P6	0.55	77.49
YOLOR P6	0.75	46.74

Table 10. Details of the accuracy in each monitoring station according to the analyzed class.

Monitoring Station	FPS Average	C-Accuracy	T-Accuracy	B-Accuracy	Average Accuracy	Total Accuracy
MS1	16.2382	99.5895	98.0769	93.1034	96.2317	99.2031
MS2	15.6871	96.4247	97.5757	100	98.0001	96.5243
MS3	14.4385	98.8059	95.3352	74.5098	89.5503	98.0156
MS4	18.3436	96.0020	96.9387	60.3448	84.4285	99.3813
MS5	12.7365	99.7759	98.8081	96.7567	98.4469	99.4775
MS6	23.1031	98.0559	95.9596	63.4920	85.8358	99.3044

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Guzmán-Torres, J.A.; Domínguez-Mota, F.J.; Tinoco-Guerrero, G.; García-Chiquito, M.C.; Tinoco-Ruíz, J.G. Efficacy Evaluation of You Only Learn One Representation (YOLOR) Algorithm in Detecting, Tracking, and Counting Vehicular Traffic in Real-World Scenarios, the Case of Morelia México: An Artificial Intelligence Approach. AI 2024, 5, 1594-1613. https://doi.org/10.3390/ai5030077

AMA Style

Guzmán-Torres JA, Domínguez-Mota FJ, Tinoco-Guerrero G, García-Chiquito MC, Tinoco-Ruíz JG. Efficacy Evaluation of You Only Learn One Representation (YOLOR) Algorithm in Detecting, Tracking, and Counting Vehicular Traffic in Real-World Scenarios, the Case of Morelia México: An Artificial Intelligence Approach. AI. 2024; 5(3):1594-1613. https://doi.org/10.3390/ai5030077

Chicago/Turabian Style

Guzmán-Torres, José A., Francisco J. Domínguez-Mota, Gerardo Tinoco-Guerrero, Maybelin C. García-Chiquito, and José G. Tinoco-Ruíz. 2024. "Efficacy Evaluation of You Only Learn One Representation (YOLOR) Algorithm in Detecting, Tracking, and Counting Vehicular Traffic in Real-World Scenarios, the Case of Morelia México: An Artificial Intelligence Approach" AI 5, no. 3: 1594-1613. https://doi.org/10.3390/ai5030077

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Efficacy Evaluation of You Only Learn One Representation (YOLOR) Algorithm in Detecting, Tracking, and Counting Vehicular Traffic in Real-World Scenarios, the Case of Morelia México: An Artificial Intelligence Approach

Abstract

1. Introduction

2. The AI Approach in the Traffic Management

3. Methodology

3.1. YOLOR Algorithm

3.2. Deep Sort Algorithm

3.3. Data Collection

3.4. Inference Methodology

3.5. Computational Details

4. Results and Discussions

4.1. Model’s Performance in Its Different Variants

4.2. Comparison Models

4.3. Inferences from Monitoring Stations

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI