Advanced Methods and Applications with Deep Learning in Object Recognition

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: 1 June 2024 | Viewed by 7300

Special Issue Editors


E-Mail Website
Guest Editor
Computer Science Department, Universidad Carlos III de Madrid, Madrid, Spain
Interests: information fusion; artificial intelligence; machine vision; autonomous vehicles
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Ecole Nationale Supérieure des Mines, Saint-Etienne, France
Interests: adaptive image processing; pattern analysis; stochastic geometry

Special Issue Information

Dear Colleagues,

Object detection and recognition are central tasks in computer vision, which include the detection of objects boundaries and their classification. They have become essential in many applications, such as search and rescue, warehouse logistics, video surveillance or monitoring using UAVs, with low-resolution or blurred images usually captured due to camera motion. Additionally, the conditions may differ across different situations, making it complex to achieve general solutions; thus, fine-tuning is essential in new scenarios.

The computer vision community has adopted deep-learning models in the last decade due to their superior performance with respect to those from classical methods. These models require a high processing power (GPUs) for training with large datasets and provide inferences in real time; typically, these models employ convolutional neural networks (CNNs). They are subdivided in two types: two-shot detectors, that search with maximum accuracy with the potential cost of inference time; and one-shot detectors, which are oriented at a minimum inference time for real-time applications. Two-shot detectors are dominated by the R-CNN family (region-proposal CNNs), such as Fast R-CNN, Faster R-CNN or Cascade R-CNN solutions, while the YOLO family dominates one-shot detectors, being SSD and RetinaNet other popular algorithms in this category. Additionally, in recent years, Vision Transformers (ViTs) have also been applied to object detection and recognition tasks. ViT-based algorithms, such as DETR or YOLOS, are based on a self-attention mechanism that learns the relationships between elements of a sequence, applying the transformer architecture to image grids. ViTs make use of CNNs as a backbone for feature extraction, given their ability to automatically extract relevant features. In addition, object detection is closely related with other open challenges in machine vision such as Multi-Object Tracking (MOT), which involves both the detection and tracking of objects of interest appearing in the video sequence. The goal in this case is not only to identify and locate the objects contained in each frame, but to also associate them across frames to keep track continuity and follow their dynamics over time. This task is usually solved by combining algorithms addressing object detection and data association, and some relevant algorithms in the SORT family (Simple Online and Real-time Tracking) can be mentioned such as deepSORT, StrongSORT or OCT-Sort.

Regarding evaluation, developing fair comparisons among different solutions is complex, considering the balance between accuracy and speed, the resolution of the input images, the configuration of the evaluation parameters, etc. Analyses are based on the available benchmarks and datasets, which are necessary to evaluate the performance of different architectures and configurations. In this sense, many authors have identified class imbalance as an additional challenge to achieving a high accuracy. In this sense, other deep-learning architectures, such as GAN or autoencoders, can be combined with detectors to enhance the training phase, increasing the size and variety of the datasets, for instance, to improve the detection of very small objects. Additionally, learning can be improved for imbalanced situations, adapting the loss function to focus learning on hard examples and avoid a bias towards numerous negative examples.

This Special Issue is aimed at contributions focused on these topics, showing the capability of novel mathematical algorithms, architectures and methods to improve the object detection and recognition tasks, with the possibility of multi-object tracking, with an emphasis in new solutions and analysis of their performance in challenging conditions in relevant applications.

Prof. Dr. Jesús García-Herrero
Prof. Dr. Johan Debayle
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • object detection and classification
  • multi-object tracking
  • deep-learning architectures
  • model training and evaluation
  • loss functions in learning
  • evaluation metrics and datasets
  • class imbalance
  • applications of object detection and object tracking
  • aerial object identification
  • transformers for object detection
  • edge ai for object detection
  • multi-modal object detection

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

17 pages, 6257 KiB  
Article
Real-Time Motorbike Detection: AI on the Edge Perspective
by Awais Akhtar, Rehan Ahmed, Muhammad Haroon Yousaf and Sergio A. Velastin
Mathematics 2024, 12(7), 1103; https://doi.org/10.3390/math12071103 - 07 Apr 2024
Viewed by 442
Abstract
Motorbikes are an integral part of transportation in emerging countries, but unfortunately, motorbike users are also one the most vulnerable road users (VRUs) and are engaged in a large number of yearly accidents. So, motorbike detection is very important for proper traffic surveillance, [...] Read more.
Motorbikes are an integral part of transportation in emerging countries, but unfortunately, motorbike users are also one the most vulnerable road users (VRUs) and are engaged in a large number of yearly accidents. So, motorbike detection is very important for proper traffic surveillance, road safety, and security. Most of the work related to bike detection has been carried out to improve accuracy. If this task is not performed in real-time then it loses practical significance, but little to none has been reported for its real-time implementation. In this work, we have looked at multiple real-time deployable cost-efficient solutions for motorbike detection using various state-of-the-art embedded edge devices. This paper discusses an investigation of a proposed methodology on five different embedded devices that include Jetson Nano, Jetson TX2, Jetson Xavier, Intel Compute Stick, and Coral Dev Board. Running the highly compute-intensive object detection model on edge devices (in real-time) is made possible by optimization. As a result, we have achieved inference rates on different devices that are twice as high as GPUs, with only a marginal drop in accuracy. Secondly, the baseline accuracy of motorbike detection has been improved by developing a custom network based on YoloV5 by introducing sparsity and depth reduction. Dataset augmentation has been applied at both image and object levels to enhance robustness of detection. We have achieved 99% accuracy as compared to the previously reported 97% accuracy, with better FPS. Additionally, we have provided a performance comparison of motorbike detection on the different embedded edge devices, for practical implementation. Full article
Show Figures

Figure 1

28 pages, 5769 KiB  
Article
Exploring the Use of Contrastive Language-Image Pre-Training for Human Posture Classification: Insights from Yoga Pose Analysis
by Andrzej D. Dobrzycki, Ana M. Bernardos, Luca Bergesio, Andrzej Pomirski and Daniel Sáez-Trigueros
Mathematics 2024, 12(1), 76; https://doi.org/10.3390/math12010076 - 25 Dec 2023
Viewed by 865
Abstract
Accurate human posture classification in images and videos is crucial for automated applications across various fields, including work safety, physical rehabilitation, sports training, or daily assisted living. Recently, multimodal learning methods, such as Contrastive Language-Image Pretraining (CLIP), have advanced significantly in jointly understanding [...] Read more.
Accurate human posture classification in images and videos is crucial for automated applications across various fields, including work safety, physical rehabilitation, sports training, or daily assisted living. Recently, multimodal learning methods, such as Contrastive Language-Image Pretraining (CLIP), have advanced significantly in jointly understanding images and text. This study aims to assess the effectiveness of CLIP in classifying human postures, focusing on its application in yoga. Despite the initial limitations of the zero-shot approach, applying transfer learning on 15,301 images (real and synthetic) with 82 classes has shown promising results. The article describes the full procedure for fine-tuning, including the choice for image description syntax, models and hyperparameters adjustment. The fine-tuned CLIP model, tested on 3826 images, achieves an accuracy of over 85%, surpassing the current state-of-the-art of previous works on the same dataset by approximately 6%, its training time being 3.5 times lower than what is needed to fine-tune a YOLOv8-based model. For more application-oriented scenarios, with smaller datasets of six postures each, containing 1301 and 401 training images, the fine-tuned models attain an accuracy of 98.8% and 99.1%, respectively. Furthermore, our experiments indicate that training with as few as 20 images per pose can yield around 90% accuracy in a six-class dataset. This study demonstrates that this multimodal technique can be effectively used for yoga pose classification, and possibly for human posture classification, in general. Additionally, CLIP inference time (around 7 ms) supports that the model can be integrated into automated systems for posture evaluation, e.g., for developing a real-time personal yoga assistant for performance assessment. Full article
Show Figures

Figure 1

17 pages, 6799 KiB  
Article
Automatic Recognition of Indoor Fire and Combustible Material with Material-Auxiliary Fire Dataset
by Feifei Hou, Wenqing Zhao and Xinyu Fan
Mathematics 2024, 12(1), 54; https://doi.org/10.3390/math12010054 - 23 Dec 2023
Cited by 1 | Viewed by 734
Abstract
Early and timely fire detection within enclosed spaces notably diminishes the response time for emergency aid. Previous methods have mostly focused on singularly detecting either fire or combustible materials, rarely integrating both aspects, leading to a lack of a comprehensive understanding of indoor [...] Read more.
Early and timely fire detection within enclosed spaces notably diminishes the response time for emergency aid. Previous methods have mostly focused on singularly detecting either fire or combustible materials, rarely integrating both aspects, leading to a lack of a comprehensive understanding of indoor fire scenarios. Moreover, traditional fire load assessment methods such as empirical formula-based assessment are time-consuming and face challenges in diverse scenarios. In this paper, we collected a novel dataset of fire and materials, the Material-Auxiliary Fire Dataset (MAFD), and combined this dataset with deep learning to achieve both fire and material recognition and segmentation in the indoor scene. A sophisticated deep learning model, Dual Attention Network (DANet), was specifically designed for image semantic segmentation to recognize fire and combustible material. The experimental analysis of our MAFD database demonstrated that our approach achieved an accuracy of 84.26% and outperformed the prevalent methods (e.g., PSPNet, CCNet, FCN, ISANet, OCRNet), making a significant contribution to fire safety technology and enhancing the capacity to identify potential hazards indoors. Full article
Show Figures

Figure 1

16 pages, 2929 KiB  
Article
Automatic Evaluation of Functional Movement Screening Based on Attention Mechanism and Score Distribution Prediction
by Xiuchun Lin, Tao Huang, Zhiqiang Ruan, Xuechao Yang, Zhide Chen, Guolong Zheng and Chen Feng
Mathematics 2023, 11(24), 4936; https://doi.org/10.3390/math11244936 - 12 Dec 2023
Viewed by 660
Abstract
Functional movement screening (FMS) is a crucial testing method that evaluates fundamental movement patterns in the human body and identifies functional limitations. However, due to the inherent complexity of human movements, the automated assessment of FMS poses significant challenges. Prior methodologies have struggled [...] Read more.
Functional movement screening (FMS) is a crucial testing method that evaluates fundamental movement patterns in the human body and identifies functional limitations. However, due to the inherent complexity of human movements, the automated assessment of FMS poses significant challenges. Prior methodologies have struggled to effectively capture and model critical human features in video data. To address this challenge, this paper introduces an automatic assessment approach for FMS by leveraging deep learning techniques. The proposed method harnesses an I3D network to extract spatiotemporal video features across various scales and levels. Additionally, an attention mechanism (AM) module is incorporated to enable the network to focus more on human movement characteristics, enhancing its sensitivity to diverse location features. Furthermore, the multilayer perceptron (MLP) module is employed to effectively discern intricate patterns and features within the input data, facilitating its classification into multiple categories. Experimental evaluations conducted on publicly available datasets demonstrate that the proposed approach achieves state-of-the-art performance levels. Notably, in comparison to existing state-of-the-art (SOTA) methods, this approach exhibits a marked improvement in accuracy. These results corroborate the efficacy of the I3D-AM-MLP framework, indicating its significance in extracting advanced human movement feature expressions and automating the assessment of functional movement screening. Full article
Show Figures

Figure 1

Review

Jump to: Research

31 pages, 1117 KiB  
Review
Traffic Sign Detection and Recognition Using YOLO Object Detection Algorithm: A Systematic Review
by Marco Flores-Calero, César A. Astudillo, Diego Guevara, Jessica Maza, Bryan S. Lita, Bryan Defaz, Juan S. Ante, David Zabala-Blanco and José María Armingol Moreno
Mathematics 2024, 12(2), 297; https://doi.org/10.3390/math12020297 - 17 Jan 2024
Viewed by 3961
Abstract
Context: YOLO (You Look Only Once) is an algorithm based on deep neural networks with real-time object detection capabilities. This state-of-the-art technology is widely available, mainly due to its speed and precision. Since its conception, YOLO has been applied to detect and recognize [...] Read more.
Context: YOLO (You Look Only Once) is an algorithm based on deep neural networks with real-time object detection capabilities. This state-of-the-art technology is widely available, mainly due to its speed and precision. Since its conception, YOLO has been applied to detect and recognize traffic signs, pedestrians, traffic lights, vehicles, and so on. Objective: The goal of this research is to systematically analyze the YOLO object detection algorithm, applied to traffic sign detection and recognition systems, from five relevant aspects of this technology: applications, datasets, metrics, hardware, and challenges. Method: This study performs a systematic literature review (SLR) of studies on traffic sign detection and recognition using YOLO published in the years 2016–2022. Results: The search found 115 primary studies relevant to the goal of this research. After analyzing these investigations, the following relevant results were obtained. The most common applications of YOLO in this field are vehicular security and intelligent and autonomous vehicles. The majority of the sign datasets used to train, test, and validate YOLO-based systems are publicly available, with an emphasis on datasets from Germany and China. It has also been discovered that most works present sophisticated detection, classification, and processing speed metrics for traffic sign detection and recognition systems by using the different versions of YOLO. In addition, the most popular desktop data processing hardwares are Nvidia RTX 2080 and Titan Tesla V100 and, in the case of embedded or mobile GPU platforms, Jetson Xavier NX. Finally, seven relevant challenges that these systems face when operating in real road conditions have been identified. With this in mind, research has been reclassified to address these challenges in each case. Conclusions: This SLR is the most relevant and current work in the field of technology development applied to the detection and recognition of traffic signs using YOLO. In addition, insights are provided about future work that could be conducted to improve the field. Full article
Show Figures

Figure 1

Planned Papers

The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.

Title: Transformers applied to object localization and classification in dense images
Authors: Gonzalo Rojas, Sergio Velastín, Jesús García
Affiliation: Universidad Carlos III de Madrid, Spain

Back to TopTop