MDPI - Publisher of Open Access Journals

21 pages, 2975 KB

Open AccessArticle

ARGUS: An Autonomous Robotic Guard System for Uncovering Security Threats in Cyber-Physical Environments

by Edi Marian Timofte, Mihai Dimian, Alin Dan Potorac, Doru Balan, Daniel-Florin Hrițcan, Marcel Pușcașu and Ovidiu Chiraș

J. Cybersecur. Priv. 2025, 5(4), 78; https://doi.org/10.3390/jcp5040078 - 1 Oct 2025

Viewed by 422

Abstract

Cyber-physical infrastructures such as hospitals and smart campuses face hybrid threats that target both digital and physical domains. Traditional security solutions separate surveillance from network monitoring, leaving blind spots when attackers combine these vectors. This paper introduces ARGUS, an autonomous robotic platform designed [...] Read more.

Cyber-physical infrastructures such as hospitals and smart campuses face hybrid threats that target both digital and physical domains. Traditional security solutions separate surveillance from network monitoring, leaving blind spots when attackers combine these vectors. This paper introduces ARGUS, an autonomous robotic platform designed to close this gap by correlating cyber and physical anomalies in real time. ARGUS integrates computer vision for facial and weapon detection with intrusion detection systems (Snort, Suricata) for monitoring malicious network activity. Operating through an edge-first microservice architecture, it ensures low latency and resilience without reliance on cloud services. Our evaluation covered five scenarios—access control, unauthorized entry, weapon detection, port scanning, and denial-of-service attacks—with each repeated ten times under varied conditions such as low light, occlusion, and crowding. Results show face recognition accuracy of 92.7% (500 samples), weapon detection accuracy of 89.3% (450 samples), and intrusion detection latency below one second, with minimal false positives. Audio analysis of high-risk sounds further enhanced situational awareness. Beyond performance, ARGUS addresses GDPR and ISO 27001 compliance and anticipates adversarial robustness. By unifying cyber and physical detection, ARGUS advances beyond state-of-the-art patrol robots, delivering comprehensive situational awareness and a practical path toward resilient, ethical robotic security. Full article

(This article belongs to the Special Issue Cybersecurity Risk Prediction, Assessment and Management)

► Show Figures

Figure 1

15 pages, 700 KB

Open AccessArticle

Promotion of Health-Harming Products on Instagram: Characterizing Strategies Boosting Audience Engagement with Cigar Marketing Messages

by Ganna Kostygina, Hy Tran, Chandler C. Carter and Sherry L. Emery

Int. J. Environ. Res. Public Health 2025, 22(8), 1285; https://doi.org/10.3390/ijerph22081285 - 17 Aug 2025

Viewed by 1047

Abstract

Social media promotion of harmful products (e.g., combustible tobacco) poses a public health threat. However, strategies that amplify exposure to and engagement with such content remain understudied. This study aims to characterize strategies boosting cigar, little cigar, and cigarillo (CLCC) marketing visibility, referrals, [...] Read more.

Social media promotion of harmful products (e.g., combustible tobacco) poses a public health threat. However, strategies that amplify exposure to and engagement with such content remain understudied. This study aims to characterize strategies boosting cigar, little cigar, and cigarillo (CLCC) marketing visibility, referrals, and engagement on Instagram. Using keyword rules, we collected publicly available CLCC-related Instagram posts from CrowdTangle for a six-year period from August 2016 to October 2021. Posts were categorized as commercial (e.g., posts by tobacco brands or vendors) or organic and were coded for consumer engagement (CE) strategies (e.g., presence of prompts to like/share) using a combination of machine learning methods and human coding. Temporal engagement trends were analyzed using metadata. A total of 320,488 CLCC-related public posts were collected, with 44.6% (n = 142,875) identified as overtly commercial. Of these, 33.5% (n = 47,832) contained CE cues, including discounts and giveaways for tagging peers, liking, commenting, or following CLCC brands and spokesperson/influencers accounts, as well as calls to participate in contests and polls. Overtly commercial CE messages consistently garnered more comments per post and likes per post than non-CE commercial posts. There was a significant upward trend in the rate of comments on CE posts, suggesting growing effectiveness in eliciting user interaction. The proliferation of and high level of engagement with cigar-related promotional messages on Instagram demonstrate the need for public health surveillance and regulation of the evolving strategies promoting CLCC marketing exposure, reach, and engagement on social media. Full article

(This article belongs to the Special Issue Evolving Role of Social Media in Health Communication)

► Show Figures

Figure 1

16 pages, 2750 KB

Open AccessArticle

Combining Object Detection, Super-Resolution GANs and Transformers to Facilitate Tick Identification Workflow from Crowdsourced Images on the eTick Platform

by Étienne Clabaut, Jérémie Bouffard and Jade Savage

Insects 2025, 16(8), 813; https://doi.org/10.3390/insects16080813 - 6 Aug 2025

Viewed by 628

Abstract

Ongoing changes in the distribution and abundance of several tick species of medical relevance in Canada have prompted the development of the eTick platform—an image-based crowd-sourcing public surveillance tool for Canada enabling rapid tick species identification by trained personnel, and public health guidance [...] Read more.

Ongoing changes in the distribution and abundance of several tick species of medical relevance in Canada have prompted the development of the eTick platform—an image-based crowd-sourcing public surveillance tool for Canada enabling rapid tick species identification by trained personnel, and public health guidance based on tick species and province of residence of the submitter. Considering that more than 100,000 images from over 73,500 identified records representing 25 tick species have been submitted to eTick since the public launch in 2018, a partial automation of the image processing workflow could save substantial human resources, especially as submission numbers have been steadily increasing since 2021. In this study, we evaluate an end-to-end artificial intelligence (AI) pipeline to support tick identification from eTick user-submitted images, characterized by heterogeneous quality and uncontrolled acquisition conditions. Our framework integrates (i) tick localization using a fine-tuned YOLOv7 object detection model, (ii) resolution enhancement of cropped images via super-resolution Generative Adversarial Networks (RealESRGAN and SwinIR), and (iii) image classification using deep convolutional (ResNet-50) and transformer-based (ViT) architectures across three datasets (12, 6, and 3 classes) of decreasing granularities in terms of taxonomic resolution, tick life stage, and specimen viewing angle. ViT consistently outperformed ResNet-50, especially in complex classification settings. The configuration yielding the best performance—relying on object detection without incorporating super-resolution—achieved a macro-averaged F1-score exceeding 86% in the 3-class model (Dermacentor sp., other species, bad images), with minimal critical misclassifications (0.7% of “other species” misclassified as Dermacentor). Given that Dermacentor ticks represent more than 60% of tick volume submitted on the eTick platform, the integration of a low granularity model in the processing workflow could save significant time while maintaining very high standards of identification accuracy. Our findings highlight the potential of combining modern AI methods to facilitate efficient and accurate tick image processing in community science platforms, while emphasizing the need to adapt model complexity and class resolution to task-specific constraints. Full article

(This article belongs to the Section Medical and Livestock Entomology)

► Show Figures

Graphical abstract

26 pages, 829 KB

Open AccessArticle

Enhanced Face Recognition in Crowded Environments with 2D/3D Features and Parallel Hybrid CNN-RNN Architecture with Stacked Auto-Encoder

by Samir Elloumi, Sahbi Bahroun, Sadok Ben Yahia and Mourad Kaddes

Big Data Cogn. Comput. 2025, 9(8), 191; https://doi.org/10.3390/bdcc9080191 - 22 Jul 2025

Viewed by 1043

Abstract

Face recognition (FR) in unconstrained conditions remains an open research topic and an ongoing challenge. The facial images exhibit diverse expressions, occlusions, variations in illumination, and heterogeneous backgrounds. This work aims to produce an accurate and robust system for enhanced Security and Surveillance. [...] Read more.

Face recognition (FR) in unconstrained conditions remains an open research topic and an ongoing challenge. The facial images exhibit diverse expressions, occlusions, variations in illumination, and heterogeneous backgrounds. This work aims to produce an accurate and robust system for enhanced Security and Surveillance. A parallel hybrid deep learning model for feature extraction and classification is proposed. An ensemble of three parallel extraction layer models learns the best representative features using CNN and RNN. 2D LBP and 3D Mesh LBP are computed on face images to extract image features as input to two RNNs. A stacked autoencoder (SAE) merged the feature vectors extracted from the three CNN-RNN parallel layers. We tested the designed 2D/3D CNN-RNN framework on four standard datasets. We achieved an accuracy of

98.9 %

. The hybrid deep learning model significantly improves FR against similar state-of-the-art methods. The proposed model was also tested on an unconstrained conditions human crowd dataset, and the results were very promising with an accuracy of

95 %

. Furthermore, our model shows an 11.5% improvement over similar hybrid CNN-RNN architectures, proving its robustness in complex environments where the face can undergo different transformations. Full article

► Show Figures

Figure 1

21 pages, 4875 KB

Open AccessArticle

Improvement of SAM2 Algorithm Based on Kalman Filtering for Long-Term Video Object Segmentation

by Jun Yin, Fei Wu, Hao Su, Peng Huang and Yuetong Qixuan

Sensors 2025, 25(13), 4199; https://doi.org/10.3390/s25134199 - 5 Jul 2025

Cited by 1 | Viewed by 1344 | Correction

Abstract

The Segment Anything Model 2 (SAM2) has achieved state-of-the-art performance in pixel-level object segmentation for both static and dynamic visual content. Its streaming memory architecture maintains spatial context across video sequences, yet struggles with long-term tracking due to its static inference framework. SAM [...] Read more.

The Segment Anything Model 2 (SAM2) has achieved state-of-the-art performance in pixel-level object segmentation for both static and dynamic visual content. Its streaming memory architecture maintains spatial context across video sequences, yet struggles with long-term tracking due to its static inference framework. SAM 2’s fixed temporal window approach indiscriminately retains historical frames, failing to account for frame quality or dynamic motion patterns. This leads to error propagation and tracking instability in challenging scenarios involving fast-moving objects, partial occlusions, or crowded environments. To overcome these limitations, this paper proposes SAM2Plus, a zero-shot enhancement framework that integrates Kalman filter prediction, dynamic quality thresholds, and adaptive memory management. The Kalman filter models object motion using physical constraints to predict trajectories and dynamically refine segmentation states, mitigating positional drift during occlusions or velocity changes. Dynamic thresholds, combined with multi-criteria evaluation metrics (e.g., motion coherence, appearance consistency), prioritize high-quality frames while adaptively balancing confidence scores and temporal smoothness. This reduces ambiguities among similar objects in complex scenes. SAM2Plus further employs an optimized memory system that prunes outdated or low-confidence entries and retains temporally coherent context, ensuring constant computational resources even for infinitely long videos. Extensive experiments on two video object segmentation (VOS) benchmarks demonstrate SAM2Plus’s superiority over SAM 2. It achieves an average improvement of 1.0 in J&F metrics across all 24 direct comparisons, with gains exceeding 2.3 points on SA-V and LVOS datasets for long-term tracking. The method delivers real-time performance and strong generalization without fine-tuning or additional parameters, effectively addressing occlusion recovery and viewpoint changes. By unifying motion-aware physics-based prediction with spatial segmentation, SAM2Plus bridges the gap between static and dynamic reasoning, offering a scalable solution for real-world applications such as autonomous driving and surveillance systems. Full article

(This article belongs to the Special Issue Image and Video Processing and Recognition Based on Artificial Intelligence: 3rd Edition)

► Show Figures

Figure 1

28 pages, 4478 KB

Open AccessReview

Two-Dimensional Human Pose Estimation with Deep Learning: A Review

by Zheyu Zhang and Seong-Yoon Shin

Appl. Sci. 2025, 15(13), 7344; https://doi.org/10.3390/app15137344 - 30 Jun 2025

Viewed by 3293

Abstract

Two-dimensional human pose estimation (2D HPE) has become a fundamental task in computer vision, driven by growing demands in intelligent surveillance, sports analytics, and healthcare. The rapid advancement of deep learning has led to the development of numerous methods. However, the resulting diversity [...] Read more.

Two-dimensional human pose estimation (2D HPE) has become a fundamental task in computer vision, driven by growing demands in intelligent surveillance, sports analytics, and healthcare. The rapid advancement of deep learning has led to the development of numerous methods. However, the resulting diversity in research directions and model architectures has made systematic assessment and comparison difficult. This review presents a comprehensive overview of recent advances in 2D HPE, focusing on method classification, technical evolution, and performance evaluation. We classify mainstream approaches by task type (single-person vs. multi-person), output strategy (regression vs. heatmap), and architectural design (top-down vs. bottom-up) and analyze their respective strengths, limitations, and application scenarios. Additionally, we summarize commonly used evaluation metrics and benchmark datasets, such as MPII, COCO, LSP, OCHuman, and CrowdPose. A major contribution of this review is the detailed comparison of the top six models on each benchmark, highlighting their network architectures, input resolutions, evaluation results, and key innovations. In light of current challenges, we also outline future research directions, including model compression, occlusion handling, and cross-domain generalization. This review serves as a valuable reference for researchers seeking both foundational insights and practical guidance in 2D human pose estimation. Full article

(This article belongs to the Special Issue Future Information & Communication Engineering 2024)

► Show Figures

Figure 1

19 pages, 4129 KB

Open AccessArticle

Study on an Improved YOLOv7-Based Algorithm for Human Head Detection

by Dong Wu, Weidong Yan and Jingli Wang

Electronics 2025, 14(9), 1889; https://doi.org/10.3390/electronics14091889 - 7 May 2025

Viewed by 1120

Abstract

In response to the decreased accuracy in person detection caused by densely populated areas and mutual occlusions in public spaces, a human head-detection approach is employed to assist in detecting individuals. To address key issues in dense scenes—such as poor feature extraction, rough [...] Read more.

In response to the decreased accuracy in person detection caused by densely populated areas and mutual occlusions in public spaces, a human head-detection approach is employed to assist in detecting individuals. To address key issues in dense scenes—such as poor feature extraction, rough label assignment, and inefficient pooling—we improved the YOLOv7 network in three aspects: adding attention mechanisms, enhancing the receptive field, and applying multi-scale feature fusion. First, a large amount of surveillance video data from crowded public spaces was collected to compile a head-detection dataset. Then, based on YOLOv7, the network was optimized as follows: (1) a CBAM attention module was added to the neck section; (2) a Gaussian receptive field-based label-assignment strategy was implemented at the junction between the original feature-fusion module and the detection head; (3) the SPPFCSPC module was used to replace the multi-space pyramid pooling. By seamlessly uniting CBAM, RFLAGauss, and SPPFCSPC, we establish a novel collaborative optimization framework. Finally, experimental comparisons revealed that the improved model’s accuracy increased from 92.4% to 94.4%; recall improved from 90.5% to 93.9%; and inference speed increased from 87.2 frames per second to 94.2 frames per second. Compared with single-stage object-detection models such as YOLOv7 and YOLOv8, the model demonstrated superior accuracy and inference speed. Its inference speed also significantly outperforms that of Faster R-CNN, Mask R-CNN, DINOv2, and RT-DETRv2, markedly enhancing both small-object (head) detection performance and efficiency. Full article

► Show Figures

Figure 1

10 pages, 2080 KB

Open AccessProceeding Paper

Tunnel Traffic Enforcement Using Visual Computing and Field-Programmable Gate Array-Based Vehicle Detection and Tracking

by Yi-Chen Lin and Rey-Sern Lin

Eng. Proc. 2025, 92(1), 30; https://doi.org/10.3390/engproc2025092030 - 25 Apr 2025

Viewed by 452

Abstract

Tunnels are commonly found in small and enclosed environments on highways, roads, or city streets. They are constructed to pass through mountains or beneath crowded urban areas. To prevent accidents in these confined environments, lane changes, slow driving, or speeding are prohibited on [...] Read more.

Tunnels are commonly found in small and enclosed environments on highways, roads, or city streets. They are constructed to pass through mountains or beneath crowded urban areas. To prevent accidents in these confined environments, lane changes, slow driving, or speeding are prohibited on single- or multi-lane one-way roads. We developed a foreground detection algorithm based on the K-nearest neighbor (KNN) and Gaussian mixture model and 400 collected images. The KNN was used to gather the first 200 image data, which were processed to remove differences and estimate a high-quality background. Once the background was obtained, new images were extracted without the background image to extract the vehicle’s foreground. The background image was processed using Canny edge detection and the Hough transform to calculate road lines. At the same time, the oriented FAST and rotated BRIEF (ORB) algorithm was employed to track vehicles in the foreground image and determine positions and lane deviations. This method enables the calculation of traffic flow and abnormal movements. We accelerated image processing using xfOpenCV on the PYNQ-Z2 and FPGA Xilinx platforms. The developed algorithm does not require pre-labeled training models and can be used during the daytime to automatically collect the required footage. For real-time monitoring, the proposed algorithm increases the computation speed ten times compared with YOLO-v2-tiny. Additionally, it uses less than 1% of YOLO’s storage space. The proposed algorithm operates stably on the PYNQ-Z2 platform with existing surveillance cameras, without additional hardware setup. These advantages make the system more appropriate for smart traffic management than the existing framework. Full article

(This article belongs to the Proceedings of 2024 IEEE 6th Eurasia Conference on IoT, Communication and Engineering)

► Show Figures

Figure 1

20 pages, 33320 KB

Open AccessArticle

Two-Stage Video Violence Detection Framework Using GMFlow and CBAM-Enhanced ResNet3D

by Mohamed Mahmoud, Bilel Yagoub, Mostafa Farouk Senussi, Mahmoud Abdalla, Mahmoud Salaheldin Kasem and Hyun-Soo Kang

Mathematics 2025, 13(8), 1226; https://doi.org/10.3390/math13081226 - 8 Apr 2025

Cited by 1 | Viewed by 1646

Abstract

Video violence detection has gained significant attention in recent years due to its applications in surveillance and security. This paper proposes a two-stage framework for detecting violent actions in video sequences. The first stage leverages GMFlow, a pre-trained optical flow network, to capture [...] Read more.

Video violence detection has gained significant attention in recent years due to its applications in surveillance and security. This paper proposes a two-stage framework for detecting violent actions in video sequences. The first stage leverages GMFlow, a pre-trained optical flow network, to capture the temporal motion between consecutive frames, effectively encoding motion dynamics. In the second stage, we integrate these optical flow images with RGB frames and feed them into a CBAM-enhanced ResNet3D network to capture complementary spatiotemporal features. The attention mechanism provided by CBAM enables the network to focus on the most relevant regions in the frames, improving the detection of violent actions. We evaluate the proposed framework on three widely used datasets: Hockey Fight, Crowd Violence, and UBI-Fight. Our experimental results demonstrate superior performance compared to several state-of-the-art methods, achieving AUC scores of 0.963 on UBI-Fight and accuracies of 97.5% and 94.0% on Hockey Fight and Crowd Violence, respectively. The proposed approach effectively combines GMFlow-generated optical flow with deep 3D convolutional networks, providing robust and efficient detection of violence in videos. Full article

(This article belongs to the Special Issue Advanced Research in Neural Networks, Machine Learning, and Image Processing)

► Show Figures

Figure 1

17 pages, 1840 KB

Open AccessArticle

Leveraging Artificial Intelligence to Predict Potential TB Hotspots at the Community Level in Bangui, Republic of Central Africa

by Kobto G. Koura, Sumbul Hashmi, Sonia Menon, Hervé G. Gando, Aziz K. Yamodo, Anne-Laure Budts, Vincent Meurrens, Saint-Cyr S. Koyato Lapelou, Olivia B. Mbitikon, Matthys Potgieter and Caroline Van Cauwelaert

Trop. Med. Infect. Dis. 2025, 10(4), 93; https://doi.org/10.3390/tropicalmed10040093 - 3 Apr 2025

Cited by 1 | Viewed by 1485

Abstract

Tuberculosis (TB) is a global health challenge, particularly in the Central African Republic (CAR), which is classified as a high TB burden country. In the CAR, factors like poverty, limited healthcare access, high HIV prevalence, malnutrition, inadequate sanitation, low measles vaccination coverage, and [...] Read more.

Tuberculosis (TB) is a global health challenge, particularly in the Central African Republic (CAR), which is classified as a high TB burden country. In the CAR, factors like poverty, limited healthcare access, high HIV prevalence, malnutrition, inadequate sanitation, low measles vaccination coverage, and conflict-driven crowded living conditions elevate TB risk. Improved AI-driven surveillance is hypothesized to address under-reporting and underdiagnosis. Therefore, we created an epidemiological digital representation of TB in Bangui by employing passive data collection, spatial analysis using a 100 × 100 m grid, and mapping TB treatment services. Our approach included estimating undiagnosed TB cases through the integration of TB incidence, notification rates, and diagnostic data. High-resolution predictions are achieved by subdividing the area into smaller units while considering influencing variables within the Bayesian model. By designating moderate and high-risk hotspots, the model highlighted the potential for precise resource allocation in TB control. The strength of our model lies in its adaptability to overcome challenges, although this may have been to the detriment of precision in some areas. Research is envisioned to evaluate the model’s accuracy, and future research should consider exploring the integration of multidrug-resistant TB within the model. Full article

(This article belongs to the Section Infectious Diseases)

► Show Figures

Figure 1

14 pages, 1399 KB

Open AccessArticle

Obstacle-Aware Crowd Surveillance with Mobile Robots in Transportation Stations

by Yumin Choi and Hyunbum Kim

Sensors 2025, 25(2), 350; https://doi.org/10.3390/s25020350 - 9 Jan 2025

Cited by 1 | Viewed by 1282

Abstract

Recent transportation systems are operated by cooperative factors including mobile robots, smart vehicles, and intelligent management. It is highly anticipated that the surveillance using mobile robots can be utilized in complex transportation areas where the high accuracy is required. In this paper, we [...] Read more.

Recent transportation systems are operated by cooperative factors including mobile robots, smart vehicles, and intelligent management. It is highly anticipated that the surveillance using mobile robots can be utilized in complex transportation areas where the high accuracy is required. In this paper, we introduce a crowd surveillance system using mobile robots and intelligent vehicles to provide obstacle avoidance in transportation stations with a consideration of different moving strategies of the robots in an existing 2D area supported by line-based barriers and surveillance formations. Then, we formally define a problem that aims to minimize the distance traveled by a mobile robot, while also considering the speed of the mobile robot and avoiding the risk of collisions when the mobile robot moves to specific locations to fulfill crowd surveillance. To solve this problem, we propose two different schemes to provide improved surveillance that can be used even when considering speed. After that, various ideas are gathered to define conditions, set various settings, and modify them to evaluate their performances. Full article

(This article belongs to the Special Issue Intelligent Service Robot Based on Sensors Technology)

► Show Figures

Figure 1

6 pages, 3621 KB

Open AccessProceeding Paper

Indoor Received Signal Strength Indicator Measurements for Device-Free Target Sensing

by Alex Zhindon-Romero, Cesar Vargas-Rosales and Fidel Rodriguez-Corbo

Eng. Proc. 2024, 82(1), 44; https://doi.org/10.3390/ecsa-11-20491 - 26 Nov 2024

Viewed by 435

Abstract

For applications such as home surveillance systems and assisted living for elderly care, sensing capabilities are essential for tasks such as locating, determining the approximate position of a person, or identifying the status of a person (static or moving), since the effects caused [...] Read more.

For applications such as home surveillance systems and assisted living for elderly care, sensing capabilities are essential for tasks such as locating, determining the approximate position of a person, or identifying the status of a person (static or moving), since the effects caused by the presence of people can be captured in the power received by signals in an infrastructure deployed for these purposes. Human interference in Received Signal Strength Indicator (RSSI) measurements between different pairs of wireless nodes can vary depending on whether the target is moving or static. To test these ideas, an experiment was conducted using four nodes equipped with the ZigBee protocol in each corner of an empty 6.9 m × 8.1 m × 3.05 m room. These nodes were configured as routers, communicating with a coordinator outside the room that instructed the nodes to send back their pairwise RSSI measurements. The coordinator was connected to a computer in order to log the measurements, as well as the time at which the measurements were generated. The code was run for every iteration of the experiment, whether the target was static, moving, or when the number of targets was increased to five. The data were then statistically analyzed to extract patterns and other target relational parameters. There was a correlation between the change in the pairwise RSSI and the path described by the target when moving through the room. The data presented by the results can aid algorithms for device-free localization and crowd classification, with a low infrastructure cost for both, and shed light on the relevant characteristics correlated with the path and crowd size in indoor settings. Full article

(This article belongs to the Proceedings of The 11th International Electronic Conference on Sensors and Applications)

► Show Figures

Figure 1

21 pages, 5375 KB

Open AccessArticle

PII-GCNet: Lightweight Multi-Modal CNN Network for Efficient Crowd Counting and Localization in UAV RGB-T Images

by Zuodong Niu, Huilong Pi, Donglin Jing and Dazheng Liu

Electronics 2024, 13(21), 4298; https://doi.org/10.3390/electronics13214298 - 31 Oct 2024

Cited by 1 | Viewed by 1365

Abstract

With the increasing need for real-time crowd evaluation in military surveillance, public safety, and event crowd management, crowd counting using unmanned aerial vehicle (UAV) captured images has emerged as an essential research topic. While conventional RGB-based methods have achieved significant success, their performance [...] Read more.

With the increasing need for real-time crowd evaluation in military surveillance, public safety, and event crowd management, crowd counting using unmanned aerial vehicle (UAV) captured images has emerged as an essential research topic. While conventional RGB-based methods have achieved significant success, their performance is severely hampered in low-light environments due to poor visibility. Integrating thermal infrared (TIR) images can address this issue, but existing RGB-T crowd counting networks, which employ multi-stream architectures, tend to introduce computational redundancy and excessive parameters, rendering them impractical for UAV applications constrained by limited onboard resources. To overcome these challenges, this research introduces an innovative, compact RGB-T framework designed to minimize redundant feature processing and improve multi-modal representation. The proposed approach introduces a Partial Information Interaction Convolution (PIIConv) module to selectively minimize redundant feature computations and a Global Collaborative Fusion (GCFusion) module to improve multi-modal feature representation through spatial attention mechanisms. Empirical findings indicate that the introduced network attains competitive results on the DroneRGBT dataset while significantly reducing floating-point operations (FLOPs) and improving inference speed across various computing platforms. This study’s significance is in providing a computationally efficient framework for RGB-T crowd counting that balances accuracy and resource efficiency, making it ideal for real-time UAV deployment. Full article

(This article belongs to the Special Issue Image Processing Based on Convolution Neural Network)

► Show Figures

Figure 1

17 pages, 3025 KB

Open AccessArticle

A Deep Learning Framework for Real-Time Bird Detection and Its Implications for Reducing Bird Strike Incidents

by Najiba Said Hamed Alzadjali, Sundaravadivazhagan Balasubaramainan, Charles Savarimuthu and Emanuel O. Rances

Sensors 2024, 24(17), 5455; https://doi.org/10.3390/s24175455 - 23 Aug 2024

Cited by 6 | Viewed by 6135

Abstract

Bird strikes are a substantial aviation safety issue that can result in serious harm to aircraft components and even passenger deaths. In response to this increased tendency, the implementation of new and more efficient detection and prevention technologies becomes urgent. The paper presents [...] Read more.

Bird strikes are a substantial aviation safety issue that can result in serious harm to aircraft components and even passenger deaths. In response to this increased tendency, the implementation of new and more efficient detection and prevention technologies becomes urgent. The paper presents a novel deep learning model which is developed to detect and alleviate bird strike issues in airport conditions boosting aircraft safety. Based on an extensive database of bird images having different species and flight patterns, the research adopts sophisticated image augmentation techniques which generate multiple scenarios of aircraft operation ensuring that the model is robust under different conditions. The methodology evolved around the building of a spatiotemporal convolutional neural network which employs spatial attention structures together with dynamic temporal processing to precisely recognize flying birds. One of the most important features of this research is the architecture of its dual-focus model which consists of two components, the attention-based temporal analysis network and the convolutional neural network with spatial awareness. The model’s architecture can identify specific features nested in a crowded and shifting backdrop, thereby lowering false positives and improving detection accuracy. The mechanisms of attention of this model itself enhance the model’s focus by identifying vital features of bird flight patterns that are crucial. The results are that the proposed model achieves better performance in terms of accuracy and real time responses than the existing bird detection systems. The ablation study demonstrates the indispensable roles of each component, confirming their synergistic effect on improving detection performance. The research substantiates the model’s applicability as a part of airport bird strike surveillance system, providing an alternative to the prevention strategy. This work benefits from the unique deep learning feature application, which leads to a large-scale and reliable tool for dealing with the bird strike problem. Full article

(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)

► Show Figures

Figure 1

24 pages, 8201 KB

Open AccessArticle

Enhancing Sustainable Transportation Infrastructure Management: A High-Accuracy, FPGA-Based System for Emergency Vehicle Classification

by Pemila Mani, Pongiannan Rakkiya Goundar Komarasamy, Narayanamoorthi Rajamanickam, Mohammad Shorfuzzaman and Waleed Mohammed Abdelfattah

Sustainability 2024, 16(16), 6917; https://doi.org/10.3390/su16166917 - 12 Aug 2024

Cited by 6 | Viewed by 2286

Abstract

Traffic congestion is a prevalent problem in modern civilizations worldwide, affecting both large cities and smaller communities. Emergency vehicles tend to group tightly together in these crowded scenarios, often masking one another. For traffic surveillance systems tasked with maintaining order and executing laws, [...] Read more.

Traffic congestion is a prevalent problem in modern civilizations worldwide, affecting both large cities and smaller communities. Emergency vehicles tend to group tightly together in these crowded scenarios, often masking one another. For traffic surveillance systems tasked with maintaining order and executing laws, this poses serious difficulties. Recent developments in machine learning for image processing have significantly increased the accuracy and effectiveness of emergency vehicle classification (EVC) systems, especially when combined with specialized hardware accelerators. The widespread use of these technologies in safety and traffic management applications has led to more sustainable transportation infrastructure management. Vehicle classification has traditionally been carried out manually by specialists, which is a laborious and subjective procedure that depends largely on the expertise that is available. Furthermore, erroneous EVC might result in major problems with operation, highlighting the necessity for a more dependable, precise, and effective method of classifying vehicles. Although image processing for EVC involves a variety of machine learning techniques, the process is still labor intensive and time consuming because the techniques now in use frequently fail to appropriately capture each type of vehicle. In order to improve the sustainability of transportation infrastructure management, this article places a strong emphasis on the creation of a hardware system that is reliable and accurate for identifying emergency vehicles in intricate contexts. The ResNet50 model’s features are extracted by the suggested system utilizing a Field Programmable Gate Array (FPGA) and then optimized by a multi-objective genetic algorithm (MOGA). A CatBoost (CB) classifier is used to categorize automobiles based on these features. Overtaking the previous state-of-the-art accuracy of 98%, the ResNet50-MOP-CB network achieved a classification accuracy of 99.87% for four primary categories of emergency vehicles. In tests conducted on tablets, laptops, and smartphones, it demonstrated excellent accuracy, fast classification times, and robustness for real-world applications. On average, it took 0.9 nanoseconds for every image to be classified with a 96.65% accuracy rate. Full article

(This article belongs to the Special Issue Sustainable Transportation Infrastructure Management)

► Show Figures

Figure 1

Search Results (114)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (114)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI