sensors-logo

Journal Browser

Journal Browser

Computer Vision and Machine Learning for Object Tracking and Recognition

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: closed (15 February 2024) | Viewed by 8737

Special Issue Editors


E-Mail Website
Guest Editor
Department of Information Science, Xi’an University of Technology, Xi’an 710048, China
Interests: visual information processing; pattern recognition
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Electrical and Computer Engineering, Oklahoma State University, Stillwater ,OK 74078,USA
Interests: image processing; machine learning; pattern recognition; computer vision; biomedical imaging and multimedia applications
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
College of Electrical and Information Engineering, Hunan University, Changsha 410082, China
Interests: computer vision; machine learning; pattern recognition
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Electronic and Information Engineering, South China University of Technology, Guangzhou 511442, China
Interests: intelligent human computer interaction; vision based human motion analysis; vision based target tracking; machine learning and pattern recognition; signal and image processing; medical image analysis

Special Issue Information

Dear Colleagues,

Recently, we have seen great progress in the potential use of computer vision and machine learning for object tracking and recognition. Computer vision information derives from vision sensors, which produces new opportunities in large language mode (LLM), for example, ChatGPT. LLM can generate suitable content according to the input text, which makes multi-mode cognition easy via model computation. Object tracking and recognition draw support from the better generalization ability of LLM for modeling the dynamic process of visual information. Recent advances in object tracking and recognition have made available in industry, agriculture, military, medical, and education for understanding specific object, human, and scene vision.

This Special Issue aims to put together original research and review articles on recent advances, technologies, solutions, applications, and new challenges in the field of computer vision.

Potential topics include but are not limited to:

  • Image processing techniques for object tracking and recognition.
  • Feature extraction and selection methods in computer vision.
  • Deep learning architectures for object detection and tracking.
  • Convolutional neural networks (CNNs) for object recognition.
  • Object-tracking algorithms.
  • Multi-object tracking in complex environments.
  • Transfer learning for object recognition and tracking.
  • Real-time object tracking and recognition on embedded systems.
  • Object tracking and recognition in video sequences.
  • Evaluation metrics and benchmark datasets for object tracking and recognition

Dr. Guangfeng Lin
Prof. Dr. Guoliang Fan
Dr. Zhigang Ling
Dr. Xin Zhang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • vision sensors
  • computer vision
  • machine learning
  • object tracking
  • object recognition
  • image processing
  • deep learning
  • feature extraction
  • neural networks
  • visual perception
  • real-time tracking

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

13 pages, 3743 KiB  
Communication
Real-Time 3D Tracking of Multi-Particle in the Wide-Field Illumination Based on Deep Learning
by Xiao Luo, Jie Zhang, Handong Tan, Jiahao Jiang, Junda Li and Weijia Wen
Sensors 2024, 24(8), 2583; https://doi.org/10.3390/s24082583 - 18 Apr 2024
Viewed by 439
Abstract
In diverse realms of research, such as holographic optical tweezer mechanical measurements, colloidal particle motion state examinations, cell tracking, and drug delivery, the localization and analysis of particle motion command paramount significance. Algorithms ranging from conventional numerical methods to advanced deep-learning networks mark [...] Read more.
In diverse realms of research, such as holographic optical tweezer mechanical measurements, colloidal particle motion state examinations, cell tracking, and drug delivery, the localization and analysis of particle motion command paramount significance. Algorithms ranging from conventional numerical methods to advanced deep-learning networks mark substantial strides in the sphere of particle orientation analysis. However, the need for datasets has hindered the application of deep learning in particle tracking. In this work, we elucidated an efficacious methodology pivoted toward generating synthetic datasets conducive to this domain that resonates with robustness and precision when applied to real-world data of tracking 3D particles. We developed a 3D real-time particle positioning network based on the CenterNet network. After conducting experiments, our network has achieved a horizontal positioning error of 0.0478 μm and a z-axis positioning error of 0.1990 μm. It shows the capability to handle real-time tracking of particles, diverse in dimensions, near the focal plane with high precision. In addition, we have rendered all datasets cultivated during this investigation accessible. Full article
Show Figures

Figure 1

16 pages, 3961 KiB  
Article
Human Action Recognition and Note Recognition: A Deep Learning Approach Using STA-GCN
by Avirmed Enkhbat, Timothy K. Shih and Pimpa Cheewaprakobkit
Sensors 2024, 24(8), 2519; https://doi.org/10.3390/s24082519 - 14 Apr 2024
Viewed by 698
Abstract
Human action recognition (HAR) is growing in machine learning with a wide range of applications. One challenging aspect of HAR is recognizing human actions while playing music, further complicated by the need to recognize the musical notes being played. This paper proposes a [...] Read more.
Human action recognition (HAR) is growing in machine learning with a wide range of applications. One challenging aspect of HAR is recognizing human actions while playing music, further complicated by the need to recognize the musical notes being played. This paper proposes a deep learning-based method for simultaneous HAR and musical note recognition in music performances. We conducted experiments on Morin khuur performances, a traditional Mongolian instrument. The proposed method consists of two stages. First, we created a new dataset of Morin khuur performances. We used motion capture systems and depth sensors to collect data that includes hand keypoints, instrument segmentation information, and detailed movement information. We then analyzed RGB images, depth images, and motion data to determine which type of data provides the most valuable features for recognizing actions and notes in music performances. The second stage utilizes a Spatial Temporal Attention Graph Convolutional Network (STA-GCN) to recognize musical notes as continuous gestures. The STA-GCN model is designed to learn the relationships between hand keypoints and instrument segmentation information, which are crucial for accurate recognition. Evaluation on our dataset demonstrates that our model outperforms the traditional ST-GCN model, achieving an accuracy of 81.4%. Full article
Show Figures

Figure 1

19 pages, 4737 KiB  
Article
SEB-YOLO: An Improved YOLOv5 Model for Remote Sensing Small Target Detection
by Yan Hui, Shijie You, Xiuhua Hu, Panpan Yang and Jing Zhao
Sensors 2024, 24(7), 2193; https://doi.org/10.3390/s24072193 - 29 Mar 2024
Cited by 1 | Viewed by 937
Abstract
Due to the limited semantic information extraction with small objects and difficulty in distinguishing similar targets, it brings great challenges to target detection in remote sensing scenarios, which results in poor detection performance. This paper proposes an improved YOLOv5 remote sensing image target [...] Read more.
Due to the limited semantic information extraction with small objects and difficulty in distinguishing similar targets, it brings great challenges to target detection in remote sensing scenarios, which results in poor detection performance. This paper proposes an improved YOLOv5 remote sensing image target detection algorithm, SEB-YOLO (SPD-Conv + ECSPP + Bi-FPN + YOLOv5). Firstly, the space-to-depth (SPD) layer followed by a non-strided convolution (Conv) layer module (SPD-Conv) was used to reconstruct the backbone network, which retained the global features and reduced the feature loss. Meanwhile, the pooling module with the attention mechanism of the final layer of the backbone network was designed to help the network better identify and locate the target. Furthermore, a bidirectional feature pyramid network (Bi-FPN) with bilinear interpolation upsampling was added to improve bidirectional cross-scale connection and weighted feature fusion. Finally, the decoupled head is introduced to enhance the model convergence and solve the contradiction between the classification task and the regression task. Experimental results on NWPU VHR-10 and RSOD datasets show that the mAP of the proposed algorithm reaches 93.5% and 93.9%respectively, which is 4.0% and 5.3% higher than that of the original YOLOv5l algorithm. The proposed algorithm achieves better detection results for complex remote sensing images. Full article
Show Figures

Figure 1

22 pages, 5843 KiB  
Article
PSMOT: Online Occlusion-Aware Multi-Object Tracking Exploiting Position Sensitivity
by Ranyang Zhao, Xinyan Zhang and Jianwei Zhang
Sensors 2024, 24(4), 1199; https://doi.org/10.3390/s24041199 - 12 Feb 2024
Viewed by 731
Abstract
Models based on joint detection and re-identification (ReID), which significantly increase the efficiency of online multi-object tracking (MOT) systems, are an evolution from separate detection and ReID models in the tracking-by-detection (TBD) paradigm. It is observed that these joint models are typically one-stage, [...] Read more.
Models based on joint detection and re-identification (ReID), which significantly increase the efficiency of online multi-object tracking (MOT) systems, are an evolution from separate detection and ReID models in the tracking-by-detection (TBD) paradigm. It is observed that these joint models are typically one-stage, while the two-stage models become obsolete because of their slow speed and low efficiency. However, the two-stage models have naive advantages over the one-stage anchor-based and anchor-free models in handling feature misalignment and occlusion, which suggests that the two-stage models, via meticulous design, could be on par with the state-of-the-art one-stage models. Following this intuition, we propose a robust and efficient two-stage joint model based on R–FCN, whose backbone and neck are fully convolutional, and the RoI-wise process only involves simple calculations. In the first stage, an adaptive sparse anchoring scheme is utilized to produce adequate, high-quality proposals to improve efficiency. To boost both detection and ReID, two key elements—feature aggregation and feature disentanglement—are taken into account. To improve robustness against occlusion, the position-sensitivity is exploited, first to estimate occlusion and then to direct the post-process for anti-occlusion. Finally, we link the model to a hierarchical association algorithm to form a complete MOT system called PSMOT. Compared to other cutting-edge systems, PSMOT achieves competitive performance while maintaining time efficiency. Full article
Show Figures

Figure 1

13 pages, 6551 KiB  
Article
Object Recognition and Grasping for Collaborative Robots Based on Vision
by Ruohuai Sun, Chengdong Wu, Xue Zhao, Bin Zhao and Yang Jiang
Sensors 2024, 24(1), 195; https://doi.org/10.3390/s24010195 - 28 Dec 2023
Cited by 1 | Viewed by 1321
Abstract
This study introduces a parallel YOLO–GG deep learning network for collaborative robot target recognition and grasping to enhance the efficiency and precision of visual classification and grasping for collaborative robots. First, the paper outlines the target classification and detection task, the grasping system [...] Read more.
This study introduces a parallel YOLO–GG deep learning network for collaborative robot target recognition and grasping to enhance the efficiency and precision of visual classification and grasping for collaborative robots. First, the paper outlines the target classification and detection task, the grasping system of the robotic arm, and the dataset preprocessing method. The real-time recognition and grasping network can identify a diverse spectrum of unidentified objects and determine the target type and appropriate capture box. Secondly, we propose a parallel YOLO–GG deep vision network based on YOLO and GG-CNN. Thirdly, the YOLOv3 network, pre-trained with the COCO dataset, identifies the object category and position, while the GG-CNN network, trained using the Cornell Grasping dataset, predicts the grasping pose and scale. This study presents the processes for generating a target’s grasping frame and recognition type using GG-CNN and YOLO networks, respectively. This completes the investigation of parallel networks for target recognition and grasping in collaborative robots. Finally, the experimental results are evaluated on the self-constructed NEU-COCO dataset for target recognition and positional grasping. The speed of detection has improved by 14.1%, with an accuracy of 94%. This accuracy is 4.0% greater than that of YOLOv3. Experimental proof was obtained through a robot grasping actual objects. Full article
Show Figures

Figure 1

25 pages, 14789 KiB  
Article
DLUT: Decoupled Learning-Based Unsupervised Tracker
by Zhengjun Xu, Detian Huang, Xiaoqian Huang, Jiaxun Song and Hang Liu
Sensors 2024, 24(1), 83; https://doi.org/10.3390/s24010083 - 23 Dec 2023
Viewed by 699
Abstract
Unsupervised learning has shown immense potential in object tracking, where accurate classification and regression are crucial for unsupervised trackers. However, the classification and regression branches of most unsupervised trackers calculate object similarities by sharing cross-correlation modules. This leads to high coupling between different [...] Read more.
Unsupervised learning has shown immense potential in object tracking, where accurate classification and regression are crucial for unsupervised trackers. However, the classification and regression branches of most unsupervised trackers calculate object similarities by sharing cross-correlation modules. This leads to high coupling between different branches, thus hindering the network performance. To address the above issue, we propose a Decoupled Learning-based Unsupervised Tracker (DLUT). Specifically, we separate the training pipelines of different branches to unlock their inherent learning potential so that different branches can fully explore the focused feature regions of interest. Furthermore, we design independent adaptive decoupling-correlation modules according to the characteristics of each branch to obtain more discriminative and easily locatable feature response maps. Finally, to suppress the noise interference brought by unsupervised pseudo-label training and highlight the foreground object, we propose a novel suppression-ranking-based unsupervised training strategy. Extensive experiments demonstrate that our DLUT outperforms state-of-the-art unsupervised trackers. Full article
Show Figures

Figure 1

21 pages, 5769 KiB  
Article
SiamHSFT: A Siamese Network-Based Tracker with Hierarchical Sparse Fusion and Transformer for UAV Tracking
by Xiuhua Hu, Jing Zhao, Yan Hui, Shuang Li and Shijie You
Sensors 2023, 23(21), 8666; https://doi.org/10.3390/s23218666 - 24 Oct 2023
Viewed by 858
Abstract
Due to high maneuverability as well as hardware limitations of Unmanned Aerial Vehicle (UAV) platforms, tracking targets in UAV views often encounter challenges such as low resolution, fast motion, and background interference, which make it difficult to strike a compatibility between performance and [...] Read more.
Due to high maneuverability as well as hardware limitations of Unmanned Aerial Vehicle (UAV) platforms, tracking targets in UAV views often encounter challenges such as low resolution, fast motion, and background interference, which make it difficult to strike a compatibility between performance and efficiency. Based on the Siamese network framework, this paper proposes a novel UAV tracking algorithm, SiamHSFT, aiming to achieve a balance between tracking robustness and real-time computation. Firstly, by combining CBAM attention and downward information interaction in the feature enhancement module, the provided method merges high-level and low-level feature maps to prevent the loss of information when dealing with small targets. Secondly, it focuses on both long and short spatial intervals within the affinity in the interlaced sparse attention module, thereby enhancing the utilization of global context and prioritizing crucial information in feature extraction. Lastly, the Transformer’s encoder is optimized with a modulation enhancement layer, which integrates triplet attention to enhance inter-layer dependencies and improve target discrimination. Experimental results demonstrate SiamHSFT’s excellent performance across diverse datasets, including UAV123, UAV20L, UAV123@10fps, and DTB70. Notably, it performs better in fast motion and dynamic blurring scenarios. Meanwhile, it maintains an average tracking speed of 126.7 fps across all datasets, meeting real-time tracking requirements. Full article
Show Figures

Figure 1

26 pages, 10602 KiB  
Article
Discovery, Quantitative Recurrence, and Inhibition of Motion-Blur Hysteresis Phenomenon in Visual Tracking Displacement Detection
by Lixiang Shi and Jianping Tan
Sensors 2023, 23(19), 8024; https://doi.org/10.3390/s23198024 - 22 Sep 2023
Cited by 1 | Viewed by 710
Abstract
Motion blur is common in video tracking and detection, and severe motion blur can lead to failure in tracking and detection. In this work, a motion-blur hysteresis phenomenon (MBHP) was discovered, which has an impact on tracking and detection accuracy as well as [...] Read more.
Motion blur is common in video tracking and detection, and severe motion blur can lead to failure in tracking and detection. In this work, a motion-blur hysteresis phenomenon (MBHP) was discovered, which has an impact on tracking and detection accuracy as well as image annotation. In order to accurately quantify MBHP, this paper proposes a motion-blur dataset construction method based on a motion-blur operator (MBO) generation method and self-similar object images, and designs APSF, a MBO generation method. The optimized sub-pixel estimation method of the point spread function (SPEPSF) is used to demonstrate the accuracy and robustness of the APSF method, showing the maximum error (ME) of APSF to be smaller than others (reduced by 86%, when motion-blur length > 20, motion-blur angle = 0), and the mean square error (MSE) of APSF to be smaller than others (reduced by 65.67% when motion-blur angle = 0). A fast image matching method based on a fast correlation response coefficient (FAST-PCC) and improved KCF were used with the motion-blur dataset to quantify MBHP. The results show that MBHP exists significantly when the motion blur changes and the error caused by MBHP is close to half of the difference of the motion-blur length between two consecutive frames. A general flow chart of visual tracking displacement detection with error compensation for MBHP was designed, and three methods for calculating compensation values were proposed: compensation values based on inter-frame displacement estimation error, SPEPSF, and no-reference image quality assessment (NR-IQA) indicators. Additionally, the implementation experiments showed that this error can be reduced by more than 96%. Full article
Show Figures

Figure 1

15 pages, 4742 KiB  
Article
Multi-Object Tracking on SWIR Images for City Surveillance in an Edge-Computing Environment
by Jihun Park, Jinseok Hong, Wooil Shim and Dae-Jin Jung
Sensors 2023, 23(14), 6373; https://doi.org/10.3390/s23146373 - 13 Jul 2023
Cited by 2 | Viewed by 1336
Abstract
Although Short-Wave Infrared (SWIR) sensors have advantages in terms of robustness in bad weather and low-light conditions, the SWIR images have not been well studied for automated object detection and tracking systems. The majority of previous multi-object tracking studies have focused on pedestrian [...] Read more.
Although Short-Wave Infrared (SWIR) sensors have advantages in terms of robustness in bad weather and low-light conditions, the SWIR images have not been well studied for automated object detection and tracking systems. The majority of previous multi-object tracking studies have focused on pedestrian tracking in visible-spectrum images, but tracking different types of vehicles is also important in city-surveillance scenarios. In addition, the previous studies were based on high-computing-power environments such as GPU workstations or servers, but edge computing should be considered to reduce network bandwidth usage and privacy concerns in city-surveillance scenarios. In this paper, we propose a fast and effective multi-object tracking method, called Multi-Class Distance-based Tracking (MCDTrack), on SWIR images of city-surveillance scenarios in a low-power and low-computation edge-computing environment. Eight-bit integer quantized object detection models are used, and simple distance and IoU-based similarity scores are employed to realize effective multi-object tracking in an edge-computing environment. Our MCDTrack is not only superior to previous multi-object tracking methods but also shows high tracking accuracy of 77.5% MOTA and 80.2% IDF1 although the object detection and tracking are performed on the edge-computing device. Our study results indicate that a robust city-surveillance solution can be developed based on the edge-computing environment and low-frame-rate SWIR images. Full article
Show Figures

Figure 1

Back to TopTop