Detection of Abnormal Pedestrian Flows with Automatic Contextualization Using Pre-Trained YOLO11n

Núñez-Vieyra, Adrián; Olivares-Rojas, Juan C.; Ferreira-Escutia, Rogelio; Méndez-Patiño, Arturo; Gutiérrez-Gnecchi, José A.; Reyes-Archundia, Enrique

doi:10.3390/mca30020044

Open AccessArticle

Detection of Abnormal Pedestrian Flows with Automatic Contextualization Using Pre-Trained YOLO11n

by

Adrián Núñez-Vieyra

,

Juan C. Olivares-Rojas

^*

,

Rogelio Ferreira-Escutia

,

Arturo Méndez-Patiño

,

José A. Gutiérrez-Gnecchi

and

Enrique Reyes-Archundia

National Technological Institute of Mexico (TecNM), Technological Institute of Morelia (ITM), Morelia 58120, Mexico

^*

Author to whom correspondence should be addressed.

Math. Comput. Appl. 2025, 30(2), 44; https://doi.org/10.3390/mca30020044

Submission received: 3 February 2025 / Revised: 1 April 2025 / Accepted: 15 April 2025 / Published: 17 April 2025

(This article belongs to the Special Issue New Trends in Computational Intelligence and Applications 2024)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Recently, video surveillance systems have evolved from expensive, human-operated monitoring systems that were only useful after the crime was committed to systems that monitor 24/7, in real time, and with less and less human involvement. This is partly due to the use of smart cameras, the improvement of the Internet, and AI-based algorithms that allow the classifying and tracking of objects in images and in some cases identifying them as threats. Threats are often associated with abnormal or unexpected situations such as the presence of unauthorized persons in a given place or time, the manifestation of a different behavior by one or more persons compared to the behavior of the majority, or simply an unexpected number of people in the place, which depends largely on the available information of their context, i.e., place, date, and time of capture. In this work, we propose a model to automatically contextualize video capture scenarios, generating data such as location, date, time, and flow of people in the scene. A strategy to measure the accuracy of the data generated for such contextualization is also proposed. The pre-trained YOLO11n algorithm and the Bot-SORT algorithm gave the best results in person detection and tracking, respectively.

Keywords:

automatic contextualization; Bot-SORT; confusion matrix; pedestrian flow; tracking; video surveillance; YOLO; YOLO11n

1. Introduction

In recent years, video surveillance systems have become an important support in the search for the clarification of all types of crimes. According to [1], in Mexico, some crimes, such as street robbery, vehicle theft, and home burglary, have decreased in recent years, a situation that coincides with the growth in the installation of video surveillance systems.

A typical video surveillance system is defined as a technological tool based on the use of video cameras to support actions such as police deployment, emergency response, crime prevention, and the administration of justice [2]. That is to say, historically, the use of video surveillance systems has been solely to respond to events that indicate a crime has been committed.

However, over the past two decades, video surveillance systems have evolved from being generators of extensive databases, often useful only after a crime has been committed, to becoming intelligent systems capable of detecting and identifying threats, tracking them through multiple cameras, and generating summaries for use by law enforcement authorities. To achieve this automatic understanding, video surveillance systems must go through several stages, such as image capture, image segmentation, object detection and classification, and object tracking between images, among other processes that together are known as video analysis.

Video analysis seeks to generate an understanding of what is happening in one or more images based on the characteristics of the image itself (image semantics) and the behavior of the objects detected in it. Achieving this understanding involves solving several problems. For example, in [3] it is mentioned that monitoring with real-time video surveillance systems still depends largely on the intervention of people, which implies human error and makes these systems less efficient. Also, in [4] it is explained that this is because there are still many scenarios where decision-making is still faster and more accurate if there is human surveillance, compared to any type of automated video analysis, and that this is mainly due to the heterogeneous nature of the context where the monitoring is carried out (place and time) [5], for example, images may be taken indoors or outdoors, there may be few or many people or other objects present, cameras may be poorly oriented, lighting may be poor, etc.

Most of the time, users of video surveillance systems are aware of the appearance of people on the scene. Nowadays, most video surveillance systems usually trigger alerts when they detect moving people or pets, which means that they have algorithms capable of detecting, classifying, and tracking certain types of objects. However, most of the time, the presence of people or animals on a scene does not represent a risk and does not warrant triggering alerts. This will largely depend on the context of the image, i.e., where and when the detection occurred. If a person is detected outside the house at midday, it may seem normal, but this changes if the detection occurs early in the morning. Therefore, calculating the pedestrian flow and comparing it with the context of the captured images justifies whether or not an alert should be issued.

In [6], a method is proposed to detect abnormal flows of people using an algorithm based on the mean displacement tracking model. The behavior of crowds and the groups that form in them are also modeled. They also reference other models for pedestrian detection and tracking such as the Gaussian mixture tracking model and the artificial neural network model. In [7], a Gaussian kernel-based integration model (GKIM) is proposed for the detection and localization of anomalies in pedestrian flows. In this context, an anomaly refers to unexpected behaviors such as pedestrians running, people lying on the ground, very large or very small flows of people, etc., situations that are difficult to detect considering the heterogeneity of the pedestrian flow itself, contextual variants (date, time, place), or changes in viewing angles. In [8], a vehicle-counting system is implemented using the YOLOv5 algorithm for the detection and classification stages and the SORT algorithm as a tracking system, innovating with an image-cropping strategy to reduce times without significantly reducing the accuracy of the results. Other research addresses the topic of anomaly detection in pedestrian zones using Deep Learning, where everything that moves through the area is considered an anomaly, except certain known objects such as pedestrians, cars, skateboards, jeeps, etc., using technologies such as Mask-RCNN, DenseNet, or Histogram of Optical Flow (HOOF) [9,10]. In [11], a method is presented to detect abnormal events in crowds of people, focused on identifying violent events or natural disasters, using Optical Flow to obtain motion vectors, and with them they train a convolutional neural network to classify whether or not it is an abnormal event. In [12], a system is presented to count pedestrians with specific characteristics in specific regions of the image. It focuses mainly on people who show little movement or even if they are static. In [13], a passenger-detection system for a transportation service is proposed using Deep Learning with the advantage of focusing only on people’s heads, which significantly reduces processing requirements. In [14], a study on pedestrian counting using YOLOv3 with Deep SORT is presented, concluding that occlusion is the main source of error in this type of measurement and highlighting that other factors, such as speed, quantity, and direction of pedestrian flow, also influence. In [15], another system for counting pedestrians is presented. In addition, their behavior when crossing a street is identified and the waiting time at intersections is calculated. In this way, a traffic light can be configured or programmed to work more efficiently. On the other hand, although using another technology, in [16], a method is presented to detect, track, and count pedestrians in real time, based on infrared images. In [17], a system for counting people grouped in different densities with high occlusion, different backgrounds, and different orientations of the video camera is presented.

In this work, a model is proposed to automatically contextualize the scenarios where video surveillance is carried out, that is, for a specific place and time, to have reference values to determine if the current flow of people is normal, abnormally low, or abnormally high. An abnormal pedestrian flow occurs when the video surveillance system detects, in a given time interval, that the number of people who have circulated in front of its cameras exceeds the expected values based on the historical statistics of the place or even if the pedestrian flow is abnormally low. For this work, these statistics are generated automatically and strengthened over time. We call this accumulation of data automatic contextualization of the environment, and when we only refer to data from a specific time interval, we speak of contextualization of a particular scenario. To build the environment context, the system automatically counts and records pedestrian flows 24/7. To perform the counting, we use smart cameras for video capture, a pre-trained version of YOLO11n that detects and classifies people, and to reduce duplication in the counting, we use a tracking algorithm called BotSORT. Both algorithms are supported by the Ultralytics Python libraries. Data recording is conducted through MongoDB. This work also defines a criterion to measure the accuracy of non-duplicity in people counting based on the metrics of a confusion matrix. As part of this research, part of the environment within a university campus was contextualized using a video surveillance system in common-use corridors, where an 82.6% accuracy was achieved in counting people without duplication. However, in less crowded environments, the accuracy increases.

2. Materials and Methods

The diagram in Figure 1 shows the process of detecting abnormal pedestrian flows, simultaneously implementing the automatic contextualization task, thus seeking to reduce false positives when generating alerts. Alerts are differentiated according to the Degree of Abnormality, being lower when the pedestrian flow approaches the average and increasing as it moves away from this degree of abnormality. A threshold is established to determine when it is appropriate to trigger alert notifications for abnormal pedestrian flow. Since pedestrian flows in a single location tend to vary over time, it was necessary to establish, through trial and error, appropriate time periods for measuring pedestrian flows, but these will surely vary depending on the location where they are applied. For this research, the algorithm is applied to predetermined fixed periods of half an hour.

The flow of the algorithm shown in Figure 1 is summarized as follows: Historical data are uploaded daily. If necessary, the exact update time can be specified using automatic contextualization, i.e., during off-peak hours. The historical data are used to calculate averages for each period, which serve as a reference for comparing new pedestrian flows and deciding whether to generate an alert. In this case, 48 periods of thirty minutes each were selected to cover a full day. At the end of the day, the data are updated, the historical averages are recalculated, and the process is repeated. If it is deemed that there are not enough data to generate a reliable context, which occurs during the first few days of the algorithm’s execution, the system loads the initial context values entered a priori based on the available knowledge about the location where the pedestrian flow detection is performed. Once the data are uploaded, the system initiates a video capture cycle to attempt to detect people.

2.1. Detection Algorithm

The algorithm we use to detect people is one of the pre-trained versions of YOLO, YOLO11n from Ultralytics on Python, because it was the one that gave the best results when compared to other pre-trained versions of YOLO (Table 1).

YOLO (You Only Look Once) is an image-segmentation and object-detection model that uses deep learning and computer vision and has been widely used due to its great performance in terms of speed and accuracy. It uses a simple neural network to find objects and each predicted object is enclosed in a bounding box called a region of interest or ROI [18,19]. Since YOLO11n is a pre-trained version, it has the ability to detect different types of objects. Continuing with the description of Figure 1, if the detected object is not a person, the system ignores it and continues capturing images. If a person is detected for the first time, that is, if he or she has just entered the frame, the registration process begins and YOLO11n assigns them a unique temporary ID while they are in the scene. Thanks to this ID, the system can identify them while the person is in the frame. This process is one of the stages of video analysis and is known as object tracking. The algorithm we use to perform this task is BoT-SORT (Bag of Tricks–Simple Online Realtime Tracking) which is part of the Ultralytics libraries in Python. When the ID is assigned, the system counts the person and records the entry event in the database using MongoDB. Pedestrian counting is reset every time period p. For this investigation, time periods p₁, p₂, p₃, …, p_k, were set to 30 min each, where k = 48. This means that 48 scenarios will be contextualized during a full day, which may or may not generate an alert for abnormal pedestrian flow. Obviously, this time period can be adjusted depending on the type of environment where the measurement is performed. The system then checks that the people counter C_p is still within the expected range (ER_p) for the pedestrian flow of that time period p, that is to say,

H_{p} - σ_{p} < {E R}_{p} < H_{p} + σ_{p}

(1)

where H_p is the average historical pedestrian flow for the period p, and

σ_{t}

is the historical standard deviation for this time period, which is calculated as follows:

σ_{p} = \sqrt{\frac{\sum_{k = 1}^{k = D} {(F_{k} - H_{k})}^{2}}{D}}

(2)

where D represents the number of days on which pedestrian flow F_k was measured for period p, separating working days from non-working days. For its part, H_k is the historical average of pedestrian flow for the same period p considering only working or non-working days as appropriate. In this way, as soon as the current pedestrian flow C_p exceeds the upper limit of the range ER_p (1), an alert is triggered for abnormally high pedestrian flow, and when the period ends, if the pedestrian flow is below the lower limit of the range ER_p (1), an alert is triggered for abnormally low pedestrian flow, as can be seen in the flow chart in Figure 1.

At the end of each period, the pedestrian flow counter is reset and the system continues capturing video. However, when a person is detected and has already been assigned a pre-assigned ID, they should no longer be counted, but should be accompanied until they leave the premises. This action is called tracking or object re-identification and is essential to maintain reliability in abnormal pedestrian flow alerts, since many of the people could be counted multiple times.

2.2. Evaluation of the Tracking Method

This research proposes a way to measure the accuracy of tracking people once they have been detected and counted. The method is based on the metrics used to calculate a confusion matrix. Basically, the confusion matrix measures the accuracy of classification algorithms based on several indicators: precision, recall, and specificity.

Precision = TP/(TP + FP)

(3)

Recall = TP/(TP + FN)

(4)

Specificity = TN/(TN + FP)

(5)

F1 = 2 (Precision)(Recall)/(Precision + Recall)

(6)

The Precision indicator (3) considers true positives (TP) and false positives (FP), where true positives are those people who are really part of the pedestrian flow and who received an identifier at least once from the pedestrian flow counting system, and false positives (FP) are those people who received a new identifier even though they already had one previously assigned, so the system counts them repeatedly.

Another indicator that is often calculated is the sensitivity or Recall (4), which takes into account true positives (TP) and false negatives (FN). In this research, false negatives are all those people who, being part of the pedestrian flow, never received an identification from the pedestrian counting system, that is, they were not counted.

Specificity (5) was also calculated, which includes true negatives (TN) and false positives (FP). In this research, true negatives are people who were correctly assigned an ID and kept it for the entire time they were in the video sequence, so they were not counted repeatedly. This way of measuring true negatives generates certainty when evaluating one tracking system against another.

Finally, another indicator called F1 (6) combines precision and sensitivity into a single metric to get a more complete view of model performance.

2.3. Automatic Contextualization

To classify pedestrian flow, it is necessary to consider the context of the images, that is, the conditions in which the capture is made, for example, the location (on a public road, a shopping center, near a stadium), the time (early morning, at lunchtime, at dusk), the type of day (working day, non-working day, holiday), the time period to be evaluated, whether it is a restricted area (school, private neighborhood), etc.

Automatic contextualization organizes the data generated by the video surveillance system into fixed half-hour periods (this value is configurable) and provides continuous feedback to determine the degree of anomaly in pedestrian flows. In our case, we tested it in university hallways and were able to characterize the context in each one without using a priori data, thus generating reliable alerts about abnormal pedestrian flows.

The flowchart in Figure 1 illustrates that after a person is counted and/or tracked, the event is recorded, both the entry of pedestrians to the location and their exit, in addition to updating the current pedestrian flow count. Using MongoDB and Python, the pedestrian entry and exit events are recorded in the image. Subsequently, over time, this data are accumulated and used to calculate the values of the variables used in Equations (1) and (2).

It was necessary to make a prior classification of working and non-working dates, as well as the duration of the periods to be evaluated, which in this example was set at half an hour.

3. Results

The results obtained in this research are shown below. First, different pre-trained versions of YOLO are evaluated to determine those that, with low computational resources, can perform reasonably well in real time. Next, the accuracy results of the previously selected versions of YOLO are presented to determine which version is the most outstanding in this field. A comparison of the accuracy of the proposed methodology with respect to other research that also measures pedestrian flows is also presented. Finally, some screenshots of the results of the proposed application with the incorporated historical data are shown.

3.1. Pre-Trained YOLO Assessment

For person detection and tracking, we evaluated several pre-trained versions of YOLO from the Ultralytics libraries in Python (Table 1). This part of the research sought to determine which versions of the algorithm could maintain adequate performance to continue providing the impression of real-time video processing with the equipment available for this research, which we consider to be mid-range. For evaluation, a four-second MPEG-4 video with dimensions of 1280 × 720 pixels was used (Figure 2). The equipment used for processing was the following: 3.2 GHz 64-bit M1 Pro processor with 10 processor cores with 16 GB of RAM and MacOS Sonoma 14.6. To perform the measurement, the application took the local start time and once the video was consumed and processed, the local end time was taken, the difference between which is projected in Table 1. The algorithm parameter settings for each test were the same: persist = True, verbose = False, classes = [0], tracker = “botsort.yaml”, IOU = 0.1, conf = 0.75, vid_stride = 10. Where the following applies:

Persist is a flag to the algorithm to enable the tracking method or simply to detect. If the value is set to false, the algorithm will assign a new ID on each detection. If true, the algorithm first tries to relocate the person to the immediate regions of the image, keeping the same ID.
Verbose is another algorithm flag. If set to true, a summary of the data is displayed on the screen, which can slow down the process. A false value prevents this output.
Tracker indicates the algorithm that will be used to track objects.
Intersection Over Union (IOU) is a value that measures how accurately the predicted bounding boxes overlap with the actual boxes. Its value ranges from 0 to 1, where 1 indicates perfect overlap. In this study, reducing the IOU value resulted in overlap errors, but these were insignificant for the automatic contextualization calculation, as there are multiple possibilities for the person to be detected.
Conf sets the degree of confidence the algorithm has in determining whether it has detected a person.
Vid_stride allows you to skip frames in videos to speed up processing, at the expense of temporal resolution. A value of 1 processes all frames; higher values skip frames.

YOLO versions of around four seconds are good candidates for real-time processing, so the shortest version, YOLO11n, was chosen.

YOLO is available in several model scales to cover different application needs from the official Ultralytics website. Some nomenclature is used, for example, the letter n indicates that it is a version of YOLO for environments with extremely limited resources, while the letter x indicates that it is a version of maximum precision, although the requirements are much higher [20].

3.2. Pedestrian Flow Assessment

To determine whether pedestrian flow is abnormal, it is necessary to count the number of people passing by the cameras over a period of time and compare it with stored statistical data. For this reason, we now evaluate the accuracy of the pre-trained versions of YOLO Ultralytics in people counting, although we only consider the five versions that, with not-so-high requirements, can perform the processing in real time. The results obtained can be seen in Table 2. The parameter values of the five algorithms analyzed were tracker = “botsort.yaml”, IOU = 0.1, and conf = 0.75.

One of the complexities of pedestrian flow is that occlusions between people can occur. In this case, the tracking and detection algorithms analyzed presented several problems that affected their indicators: the greater the pedestrian flow, the greater the probability of occlusions occurring, and with it an increase in false negatives and false positives.

The images in Figure 3 illustrate the case of low, medium, and high pedestrian flows obtained from the videos used for this test.

With the results shown in Table 2, the YOLO11n version was chosen for its results in accuracy, speed, and low requirements. Averaging its performance in low, medium, and high occlusion environments, it was compared with other research that also takes pedestrian flow into account, and the results are shown in Table 3. The data shown in Table 3 are directly referenced from the publications, and it is important to highlight that each was conducted for different purposes and in different scenarios and conditions. In the case of our proposed measurements and its various case studies, the tests were conducted in multiple environments and contexts, including up to 55 30 min videos with varied pedestrian flows (a sample of the videos used, as well as the application code in Python and a script with the database, are available in a repository (https://itecm-my.sharepoint.com/:f:/g/personal/adrian_nv_morelia_tecnm_mx/EgUaRneiQYJOm12KLQlaZ7kBpYQ__zSl0PyrBUK9v5MMMQ?e=GhXx0t), accessed on 1 April 2025.

Although for different purposes, different investigations have counted people in a pedestrian flow. Table 3 shows some of them, including three cases of our proposal. The analyzed methods mainly use modified versions of YOLO for detection and SORT for tracking. Most also improve the count using the ID associated with the tracking and are fast enough to operate in real time. In the case of this research, although tests were conducted with different versions of pre-trained YOLO, the fastest and most accurate were versions 8n and 11n. A more detailed comparison of speed and accuracy is presented in Table 1 and Table 2, respectively. To determine the algorithm’s accuracy in people counting, we propose an evaluation method based on the correct retention of the identifier assigned to each person upon entering the frame and until they exit. In the case of our proposal, the objective is to reduce false positives in alerts when detecting abnormal pedestrian flows, for which the image capture environment of the video surveillance system is automatically contextualized.

Therefore, it is important to clarify that the research referenced in Table 3 calculates the accuracy of the detection algorithm, as that is its proposal, and the tracking accuracy is calculated separately. In our case, we calculate the accuracy of people counting, including the correct identification retention during tracking. Some research in Table 3 does not even incorporate the tracking process, as people counting is performed on individual images rather than a sequence of images.

In Table 3, the second-to-last and last rows show two versions of our proposal implemented on YOLOv8n and YOLO11n. These versions were tested with 12 videos of half an hour each, with low pedestrian flow, where occlusions between people were very occasional, which reduces errors and facilitates counting. In the last row, the same proposal is presented with YOLO11n, but tested with 55 videos of half an hour each, with varied pedestrian flows (high, medium, and low), where occlusions between people are more frequent.

3.3. Automatic Contextualization Results

For contextualization, a database was generated in MongoDB version 7.0.12, with two tables, one to record events and another to record alerts for abnormal flow. In the first few weeks of the abnormal pedestrian flow detection algorithm being executed, the reference data, such as the historical mean and historical standard deviation, are weak, and the abnormal pedestrian flow alerts are not very accurate. However, as the days go by, for example, a month of monitoring, the results are adjusted and strengthened, and consequently, increasingly reliable results are generated. Figure 4 and Figure 5 show examples of the algorithm running and show how the historical average and historical standard deviation, for each specific period, adjust over time. Finally, when pedestrian flow exceeds certain pre-established limits with respect to historical data (Figure 6), an alert is triggered indicating that pedestrian flow has exceeded expected values. The alert level is a pre-established range from zero to ten, where zero is for current pedestrian flow values less than or equal to the historical average and ten for the sum of the historical average plus the standard deviation. This sensitivity of the system can be adjusted according to the dynamics of the environment where the measurements are made.

4. Discussion

The results of this research seek to clarify that pre-trained versions of the YOLO algorithm are very useful for specific tasks of detecting and tracking people, without the need for expensive equipment for training and with sufficient results for some less specialized tasks. In this case, several pre-trained versions of YOLO are useful for this type of task without requiring specialized hardware, which makes them ideal for real-time video surveillance tasks. On the other hand, the measurement of pedestrian flows presented the challenge of measuring the detection accuracy of the algorithms used but also of not counting them repeatedly and measuring such a situation. Although the results of our methodology are not the highest, this may be because this research considered different contextual complexities, considering environments with low pedestrian flow where 100% accuracy was achieved but also observing certain weaknesses of the detection and tracking algorithms as pedestrian flow increased. However, if we consider that the objective of this research is not the accuracy of the people count, but the classification of pedestrian flows and determining whether or not it is an abnormal pedestrian flow, knowing the error in the count will allow generating reliable alerts, which will also be strengthened over time.

Part of the future work will be to integrate neighboring environments that provide feedback to each other, thus expanding the scope of this research. Another improvement to this methodology is the modeling of environments to characterize them and start from contextual values closer to the real ones. Another thing that can be worked on is the automatic categorization of the context so that the system automatically determines how to quantify the type of context for each environment. For example, in this research only the a priori distinction of “working days” and “non-working days” was made; however, the system could make this distinction automatically and identify additional ones.

5. Conclusions

A methodology was presented and an application was developed for the detection of abnormal pedestrian flows using a pre-trained YOLO11n neural network that allows automatic contextualization based on historical statistics of the environment. The application issues alerts if it detects a pedestrian flow above or below the expected values, considering confidence thresholds based on the standard deviation. The system validation was performed in environments with different pedestrian flows, and a counting and tracking strategy based on the assignment of IDs by the tracking system of the pre-trained network was validated, obtaining an accuracy of 82.6% averaged from environments with different pedestrian flow values.

Author Contributions

Conceptualization, A.N.-V. and J.C.O.-R.; methodology, A.N.-V. and R.F.-E.; software, A.N.-V. and A.M.-P.; validation, A.N.-V., J.A.G.-G. and J.C.O.-R.; formal analysis, E.R.-A. and R.F.-E.; investigation, A.N.-V., J.A.G.-G. and J.C.O.-R.; resources, J.A.G.-G. and E.R.-A.; data curation, A.N.-V. and J.C.O.-R.; writing—original draft preparation, A.N.-V. and R.F.-E.; writing—review and editing, R.F.-E., J.A.G.-G. and E.R.-A.; visualization, R.F.-E. and J.A.G.-G.; supervision, A.M.-P. and E.R.-A.; project administration, J.C.O.-R. and A.M.-P.; funding acquisition, J.C.O.-R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by TECNOLÓGICO NACIONAL DE MÉXICO, with support provided through project 21661.25-P.

Data Availability Statement

The original data presented in the study are openly available on a personal public drive at https://itecm-my.sharepoint.com/:f:/g/personal/adrian_nv_morelia_tecnm_mx/EgUaRneiQYJOm12KLQlaZ7kBpYQ__zSl0PyrBUK9v5MMMQ?e=GhXx0t, accessed on 1 April 2025.

Conflicts of Interest

The authors declare no conflict of interest.

References

INEGI. Seguridad Pública y Justicia, INEGI, 30 Marzo 2024. Available online: https://www.inegi.org.mx/temas/delitos/ (accessed on 7 May 2024).
Sistema Nacional de Seguridad Pública. Norma Técnica Para Estandarizar las Características Técnicas y de Interoperabilidad de los Sistemas de Vídeo Vigilancia Para la Seguridad Pública; Centro Nacional de Información: Mexico City, Mexico, 2018. [Google Scholar]
Edwin, J.; Greeshma, M.; Mithun-Haridas, T.P.; Supriya, M.H. Face Recognition based Surveillance System Using FaceNet and MTCNN on Jetson TX2. In Proceedings of the 2019 International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 15–16 March 2019; pp. 608–613. [Google Scholar]
Pritch, Y.; Ratovitch, S.; Hendel, A.; Peleg, S. Clustered Synopsis of Surveillance Video. In Proceedings of the 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, Genova, Italy, 2–4 September 2009. [Google Scholar]
Ahmed, A.; Dogra, D.P.; Kar, S.; Roy, P.P. Trajectory-Based Surveillance Analysis: A Survey. IEEE Trans. Circuits Syst. Video Technol. 2019, 29, 1985–1997. [Google Scholar] [CrossRef]
Ma, J.; Song, W. Automatic clustering method of abnormal crowd flow pattern detection. Procedia Eng. 2013, 62, 509–518. [Google Scholar] [CrossRef]
Ullah, H.; Altamimi, A.B.; Uzair, M.; Ullah, M. Anomalous entities detection and localization in pedestrian flows. Neurocomputing 2018, 290, 74–86. [Google Scholar] [CrossRef]
Valencia, D.; Muñoz, E.; Muñoz-Añasco, M. Impact of the Preprocessing Stage on the Performance of Offline Automatic Vehicle Counting using YOLO. IEEE Lat. Am. Trans. 2024, 22, 723–732. [Google Scholar] [CrossRef]
Pustokhina, I.V.; Pustokhin, D.A.; Vaiyapuri, T.; Gupta, D.; Kumar, S.; Shankar, K. An automated deep learning-based anomaly detection in pedestrian walkways for vulnerable road users safety. Saf. Sci. 2021, 142, 105356. [Google Scholar] [CrossRef]
Ramalingam, M.S.G. Towards detection of abnormal event and reporting for pedestrian video surveillance. Int. J. Health Sci. 2022, 6, 4440–4455. [Google Scholar] [CrossRef]
Direkoglu, C. Abnormal Crowd Behavior Detection Using Motion Information Images and Convolutional Neural Networks. IEEE Access 2020, 8, 80408–80416. [Google Scholar] [CrossRef]
Li, J.; Huang, L.; Liu, C. An efficient self-learning people counting system. In Proceedings of the The First Asian Conference on Pattern Recognition, Beijing, China, 28–28 November 2011. [Google Scholar]
Kim, H.; Sohn, M.K.; Lee, S.H. Development of a Real-Time Automatic Passenger Counting System using Head Detection Based on Deep Learning. J. Inf. Process. Syst. 2022, 18, 428–442. [Google Scholar]
Meli, W.; Lacy, F.; Ismail, Y. Video-Based Automated Pedestrians Counting Algorithms for Smart Cities. Int. J. Comput. Digit. Syst. 2020, 6, 1065–1078. [Google Scholar] [CrossRef] [PubMed]
Wickramasinghe, K.S.; Ganegoda, G.U. Pedestrian Detection, Tracking, Counting, Waiting Time Calculation and Trajectory Detection for Pedestrian Crossings Traffic light systems. In Proceedings of the 2020 20th International Conference on Advances in ICT for Emerging Regions (ICTer), Colombo, Sri Lanka, 4–7 November 2020. [Google Scholar]
Shahzad, A.R.; Jalal, A. A Smart Surveillance System for Pedestrian Tracking and Counting using Template Matching. In Proceedings of the 2021 International Conference on Robotics and Automation in Industry (ICRAI), Rawalpindi, Pakistan, 26–27 October 2021. [Google Scholar]
Pervaiz, M.; Ghadi, Y.; Gochoo, M.; Jalal, A.; Kamal, S.; Kim, D.S. A Smart Surveillance System for People Counting and Tracking Using Particle Flow and Modified SOM. Sustainability 2021, 13, 5367. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Jocher, G.; Chaurasia, A.; Munawar, M.R. Ultralytics YOLO Docs, 05 07 2024. Available online: https://docs.ultralytics.com (accessed on 8 August 2024).
Inc, U. Ultralytics YOLO Docs, 07 01 2025. Available online: https://docs.ultralytics.com/es/models/ (accessed on 17 January 2025).
Núñez-Vieyra, A.; Ferreira-Escutia, R.; Olivares-Rojas, J.; Méndez-Patiño, A.; Gutiérrez-Gnecchi, J.; Reyes-Archundia, E. Automatic Detection of Abnormal Pedestrian Flows, Using Classification and Tracking with Pre-trained YOLOv8. In Proceedings of the Advances in Computational Intelligence MICAI 2024 International Workshop, Tonanzintla, México, 21–25 October 2024; Springer: Cham, Switzerland, 2024; pp. 87–98. [Google Scholar]
Zhang, H.; Xu, J.; Zhang, X. Pedestrian tracking and counting based on YOLOv4 and DeepSORT in Subway Stations. In Proceedings of the 2023 35th Chinese Control and Decision Conference (CCDC), Yichang, China, 20–22 May 2023; pp. 732–737. [Google Scholar] [CrossRef]
Menon, A.; Omman, B.; Asha, S. Pedestrian Counting Using Yolo V3. In Proceedings of the 2021 International Conference on Innovative Trends in Information Technology (ICITIIT), Kottayam, India, 11–12 February 2021; pp. 1–9. [Google Scholar] [CrossRef]
Deleu, E.; Elez, S.; Gadodia, A.; Macvaugh, K.; Zhao, G. Using Deep Learning for Urban Pedestrian Counting. In Proceedings of the 2021 IEEE MIT Undergraduate Research Technology Conference (URTC), Cambridge, MA, USA, 8–10 October 2021; pp. 1–5. [Google Scholar] [CrossRef]

Figure 1. Flowchart for detecting abnormal pedestrian flow based on automatic historical context.

Figure 2. Video used in the evaluation of the different versions of YOLO Ultralytics.

Figure 4. Screenshot of the application for alerts on abnormal pedestrian flow. In the first executions, the reference values are usually high but they will normalize over the days.

Figure 5. Screenshot of the app for alerts on abnormal pedestrian flow, historical data, and contextualization are strengthened.

Figure 6. An alert is triggered when pedestrian flow exceeds certain preset limits with respect to the historical average and historical standard deviation.

Figure 3. Screenshots of the three videos used for the test. From left to right, high, medium, and low flows, respectively. The presence of occlusions between people is exemplified.

Table 1. YOLO models evaluated by processing a four-second video in which only people were detected.

Model	Processing Time (Seconds)
YOLO11n	3.55
YOLOv8n	3.64
YOLOv5nu	4.06
YOLO11s	4.58
YOLOv9t	4.69
YOLOv10n	5.76
YOLOv9s	6.52
YOLO11m	7.24
YOLO11l	9.22
YOLOv10s	9.27
YOLOv9c	9.81
YOLOv10m	9.90
YOLOv9m	10.87
YOLOv10b	11.80
YOLOv10l	12.99
YOLO11x	13.28
YOLOv10x	15.20
YOLOv3u	17.57
YOLOv9e	17.80

Table 2. Evaluation of the pedestrian-flow-counting system using different YOLO models. The processed video length and actual video length are expressed in seconds.

Model	Pedestrian Flow	Precision	Recall	Specificity	F1	Video Length with Processing	Actual Video Length
YOLO11n	high	0.786	0.846	0.727	0.815	30.66	29
	medium	0.692	0.900	0.556	0.783	115.0	120
	low	1.000	0.900	1.000	0.947	117.0	120
YOLOv8n	high	0.600	0.667	0.429	0.632	30.70	29
	medium	0.688	1.000	0.444	0.815	118.0	120
	low	1.000	0.909	1.000	0.952	117.50	120
YOLOv5nu	high	0.583	0.778	0.167	0.667	31.33	29
	medium	0.643	0.900	0.545	0.750	116.0	120
	low	0.889	0.889	0.889	0.889	117.30	120
YOLO11s	high	0.615	0.889	0.375	0.727	36.00	29
	medium	0.688	1.000	0.545	0.815	153.0	120
	low	1.000	0.900	1.000	0.947	154.0	120
YOLOv9t	high	0.625	0.909	0.333	0.741	36.23	29
	medium	0.769	0.909	0.625	0.833	150.0	120
	low	1.000	0.900	1.000	0.947	152.0	120

Table 3. Several studies have conducted pedestrian flow counts for various purposes and contexts. Our proposal includes three cases (the last three rows): the first two with low and medium pedestrian flow, and the last includes scenarios with high occlusion and high pedestrian flow.

Research	Precision	Detection Model	Tracking Algorithm	Use ID to Precision	Real-Time Processing
[13]	99.00%	Tiny-YOLOv3	KCF	Not	Yes
[14]	83.00%	YOLOv3	Deep SORT	yes	Not
[15]	72.00%	Haar-cascade	SORT	yes	Yes
[16]	82.11%	Haar-like features	Kalman Filter	yes	Yes
[17]	86.90%	HOG	Motion Trajectories	yes	Not
[21]	90.00%	YOLOv8n	Byte Track x	yes	Yes
[22]	90.61%	YOLOv4_I	DeepSORT		Yes
[23]	96.1%	YOLOv3	---	Not	Not
[24]	83%	DeepLearning	---	Not	Yes
Our proposal ¹	89.60%	YOLOv8n	BoT-SORT	yes	Yes
Our proposal ²	90.30%	YOLO11n	BoT-SORT	yes	Yes
Our proposal ³	82.60%	YOLO11n	BoT-SORT	yes	Yes

^{1 and 2} Tested with 12 videos of half an hour each, with low pedestrian flow, where occlusions between people were very occasional, which reduces errors and facilitates counting. ³ Tested with 55 videos of half an hour each, with varied pedestrian flows (high, medium, and low), where occlusions between people are more frequent.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Núñez-Vieyra, A.; Olivares-Rojas, J.C.; Ferreira-Escutia, R.; Méndez-Patiño, A.; Gutiérrez-Gnecchi, J.A.; Reyes-Archundia, E. Detection of Abnormal Pedestrian Flows with Automatic Contextualization Using Pre-Trained YOLO11n. Math. Comput. Appl. 2025, 30, 44. https://doi.org/10.3390/mca30020044

AMA Style

Núñez-Vieyra A, Olivares-Rojas JC, Ferreira-Escutia R, Méndez-Patiño A, Gutiérrez-Gnecchi JA, Reyes-Archundia E. Detection of Abnormal Pedestrian Flows with Automatic Contextualization Using Pre-Trained YOLO11n. Mathematical and Computational Applications. 2025; 30(2):44. https://doi.org/10.3390/mca30020044

Chicago/Turabian Style

Núñez-Vieyra, Adrián, Juan C. Olivares-Rojas, Rogelio Ferreira-Escutia, Arturo Méndez-Patiño, José A. Gutiérrez-Gnecchi, and Enrique Reyes-Archundia. 2025. "Detection of Abnormal Pedestrian Flows with Automatic Contextualization Using Pre-Trained YOLO11n" Mathematical and Computational Applications 30, no. 2: 44. https://doi.org/10.3390/mca30020044

APA Style

Núñez-Vieyra, A., Olivares-Rojas, J. C., Ferreira-Escutia, R., Méndez-Patiño, A., Gutiérrez-Gnecchi, J. A., & Reyes-Archundia, E. (2025). Detection of Abnormal Pedestrian Flows with Automatic Contextualization Using Pre-Trained YOLO11n. Mathematical and Computational Applications, 30(2), 44. https://doi.org/10.3390/mca30020044

Article Menu

Detection of Abnormal Pedestrian Flows with Automatic Contextualization Using Pre-Trained YOLO11n

Abstract

1. Introduction

2. Materials and Methods

2.1. Detection Algorithm

2.2. Evaluation of the Tracking Method

2.3. Automatic Contextualization

3. Results

3.1. Pre-Trained YOLO Assessment

3.2. Pedestrian Flow Assessment

3.3. Automatic Contextualization Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI