Three-Dimensional Machine Vision for Robots: Human Activity and Scene Understanding

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 15 June 2024 | Viewed by 2074

Special Issue Editors


E-Mail Website
Guest Editor
Embedded Technology and Visual Processing Research Center, School of Computer Science and Technology, Xidian University, Xi’an 710071, China
Interests: 3D vision; scene understanding; robot; deep learning
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Embedded Technology and Visual Processing Research Center, School of Computer Science and Technology, Xidian University, Xi’an 710071, China
Interests: scene understanding; robot; human–object interaction

Special Issue Information

Dear Colleagues,

Robots have been used in many different areas. Human activity and scene understanding is very important in service robots, because robots should understand the person and environment completely. Human activity and scene understanding is useful in many areas, such as security, surveillance, human–computer interactions, patient monitoring analysis systems, sports, and robotics. With the development of deep learning video analysis techniques for robots, scene understanding, natural language processing, multimodal features (including appearance features, spatial features, and semantic features) based on video frames, skeleton data, and semantic labels have all been used to improve the performance of human activity and scene understanding. Vision transformers and graph models have achieved exemplary performance for a broad range of computer vision tasks, e.g., image recognition, object detection, segmentation, and image captioning. All of these tasks are helpful for robots to develop understanding and perception.

This Special Issue seeks original contributions that help advance the theory and algorithmic design of vision transformers and graph models, and focus on presenting state-of-the-art vision transformers and graph models based on human activity understanding techniques that are developed for solving important problems in 3D robot action/activity recognition, understanding, prediction, and so on.

In this Special Issue, original research articles and reviews are welcome. Research areas may include (but are not limited to) the following:

  • Human–object interaction recognition;
  • Graph models;
  • Action recognition;
  • Graph neural networks;
  • Action predictions;
  • Two-/Three-dimensional scene understanding;
  • Two-/Three-dimensional object recognition.

Prof. Dr. Liang Zhang
Dr. Guangming Zhu
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • robot scene understanding
  • action recognition
  • graph models
  • action prediction

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

26 pages, 17621 KiB  
Article
DPCalib: Dual-Perspective View Network for LiDAR-Camera Joint Calibration
by Jinghao Cao, Xiong Yang, Sheng Liu, Tiejian Tang, Yang Li and Sidan Du
Electronics 2024, 13(10), 1914; https://doi.org/10.3390/electronics13101914 - 13 May 2024
Viewed by 361
Abstract
The precise calibration of a LiDAR-camera system is a crucial prerequisite for multimodal 3D information fusion in perception systems. The accuracy and robustness of existing traditional offline calibration methods are inferior to methods based on deep learning. Meanwhile, most parameter regression-based online calibration [...] Read more.
The precise calibration of a LiDAR-camera system is a crucial prerequisite for multimodal 3D information fusion in perception systems. The accuracy and robustness of existing traditional offline calibration methods are inferior to methods based on deep learning. Meanwhile, most parameter regression-based online calibration methods directly project LiDAR data onto a specific plane, leading to information loss and perceptual limitations. A novel network, DPCalib, a dual perspective view network that mitigates the aforementioned issue, is proposed in this paper. This paper proposes a novel neural network architecture to achieve the fusion and reuse of input information. We design a feature encoder that effectively extracts features from two orthogonal views using attention mechanisms. Furthermore, we propose an effective decoder that aggregates features from two views, thereby obtaining accurate extrinsic parameter estimation outputs. The experimental results demonstrate that our approach outperforms existing SOTA methods, and the ablation experiments validate the rationality and effectiveness of our work. Full article
Show Figures

Figure 1

13 pages, 16460 KiB  
Article
Research on PointPillars Algorithm Based on Feature-Enhanced Backbone Network
by Xiaoning Shu and Liang Zhang
Electronics 2024, 13(7), 1233; https://doi.org/10.3390/electronics13071233 - 27 Mar 2024
Viewed by 579
Abstract
In the industrial field, the 3D target detection algorithm PointPillars has gained popularity. Improving target detection accuracy while maintaining high efficiency has been a significant challenge. To address the issue of low target detection accuracy in the PointPillars 3D target detection algorithm, this [...] Read more.
In the industrial field, the 3D target detection algorithm PointPillars has gained popularity. Improving target detection accuracy while maintaining high efficiency has been a significant challenge. To address the issue of low target detection accuracy in the PointPillars 3D target detection algorithm, this paper proposes an algorithm based on feature enhancement to improve the backbone network. The algorithm enhances preliminary feature information of the backbone network by modifying it based on PointPillars with the aid of channel attention and spatial attention mechanisms. To address the inefficiency caused by the excessive number of subsampled parameters in PointPillars, FasterNet (a lightweight and efficient feature extraction network) is utilized for down-sampling and forming different scale feature maps. To prevent the loss and blurring of extracted features resulting from the use of inverse convolution, we utilize the lightweight and efficient up-sampling modules Carafe and Dysample for adjusting resolution. Experimental results indicate improved accuracy under all difficulties of the KITTI dataset, demonstrating the superiority of the algorithm over PointPillars. Full article
Show Figures

Figure 1

17 pages, 2228 KiB  
Article
Applying Machine Learning to Construct a Printed Circuit Board Gold Finger Defect Detection System
by Chien-Yi Huang and Pei-Xuan Tsai
Electronics 2024, 13(6), 1090; https://doi.org/10.3390/electronics13061090 - 15 Mar 2024
Viewed by 696
Abstract
Machine vision systems use industrial cameras’ digital sensors to collect images and use computers for image pre-processing, analysis, and the measurements of various features to make decisions. With increasing capacity and quality demands in the electronic industry, incoming quality control (IQC) standards are [...] Read more.
Machine vision systems use industrial cameras’ digital sensors to collect images and use computers for image pre-processing, analysis, and the measurements of various features to make decisions. With increasing capacity and quality demands in the electronic industry, incoming quality control (IQC) standards are becoming more and more stringent. The industry’s incoming quality control is mainly based on manual sampling. Although it saves time and costs, the miss rate is still high. This study aimed to establish an automatic defect detection system that could quickly identify defects in the gold finger on printed circuit boards (PCBs) according to the manufacturer’s standard. In the general training iteration process of deep learning, parameters required for image processing and deductive reasoning operations are automatically updated. In this study, we discussed and compared the object detection networks of the YOLOv3 (You Only Look Once, Version 3) and Faster Region-Based Convolutional Neural Network (Faster R-CNN) algorithms. The results showed that the defect classification detection model, established based on the YOLOv3 network architecture, could identify defects with an accuracy of 95%. Therefore, the IQC sampling inspection was changed to a full inspection, and the surface mount technology (SMT) full inspection station was canceled to reduce the need for inspection personnel. Full article
Show Figures

Figure 1

Back to TopTop