remotesensing-logo

Journal Browser

Journal Browser

Signal Processing and Machine Learning for Autonomous Vehicles

A special issue of Remote Sensing (ISSN 2072-4292). This special issue belongs to the section "Engineering Remote Sensing".

Deadline for manuscript submissions: closed (1 December 2023) | Viewed by 11740

Special Issue Editors


E-Mail Website
Guest Editor
Lassonde School of Engineering, York University, Toronto, ON, Canada
Interests: computer vision; deep learning; fewshot learning; video object segmentation; video understanding; spatiotemporal models interpretability

E-Mail Website
Guest Editor
NVIDIA, Santa Clara, CA, USA
Interests: 3D computer vision; autonomous driving; generative models; perception; imitation learning

Special Issue Information

Dear Colleagues,

Autonomous driving has attracted a multitude of research from the signal processing, computer vision and machine/deep-learning communities. The integration of signal processing, machine learning and advanced sensing technologies is a key enabler for self-driving cars to operate in real-world scenarios. The different types of sensors used, such as cameras, LiDAR, GPS, radars, and ultrasound, and the ability to operate using multiple modalities offer a wide variety of impactful research problems. Machine learning research problems intersecting with the aforementioned topics including perception, probabilistic modeling, future prediction, path planning, and reinforcement-learning-based autodriving are also investigated in the autonomous driving research. Finally, going beyond benchmarks and deploying in real-world scenarios requires robustness, the ability to operate with out-of-distribution scenarios and to take safety into consideration. All the above has formed different research topics that are of interest to the autonomous driving community in both academia and industry, and in the intersection between signal processing and machine learning.

The special issue welcomes open-call submissions on the state-of-the-art current and emerging technologies and methodologies in multi-modal learning, multi-sensor utilization, and the interplay between signal-processing- and learning-based approaches.

Prospective authors are welcome to submit original research (not published or currently under consideration by any other journal or conference) and technical papers in the field. Topics in autonomous driving covered include:

  • X sensor (camera, LiDAR, radar, etc.)-based perception;
  • Multi-modal fusion and data fusion for autonomous driving;
  • Probabilistic modeling with multi-modal sensory input;
  • High-fidelity simulation for different sensory data;
  • Robustness to out-of-distribution scenarios;
  • Benchmarks and datasets with different sensory data;
  • Interpretability of multi-modal autodriving models;
  • Reinforcement-learning-based autodriving systems;
  • Enhanced path planning with sensory data processing;
  • Human factors and safety in autodriving.

Dr. Mennatullah Siam
Dr. Xinshuo Weng
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Remote Sensing is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

25 pages, 25423 KiB  
Article
Switchable-Encoder-Based Self-Supervised Learning Framework for Monocular Depth and Pose Estimation
by Junoh Kim, Rui Gao, Jisun Park, Jinsoo Yoon and Kyungeun Cho
Remote Sens. 2023, 15(24), 5739; https://doi.org/10.3390/rs15245739 - 15 Dec 2023
Viewed by 858
Abstract
Monocular depth prediction research is essential for expanding meaning from 2D to 3D. Recent studies have focused on the application of a newly proposed encoder; however, the development within the self-supervised learning framework remains unexplored, an aspect critical for advancing foundational models of [...] Read more.
Monocular depth prediction research is essential for expanding meaning from 2D to 3D. Recent studies have focused on the application of a newly proposed encoder; however, the development within the self-supervised learning framework remains unexplored, an aspect critical for advancing foundational models of 3D semantic interpretation. Addressing the dynamic nature of encoder-based research, especially in performance evaluations for feature extraction and pre-trained models, this research proposes the switchable encoder learning framework (SELF). SELF enhances versatility by enabling the seamless integration of diverse encoders in a self-supervised learning context for depth prediction. This integration is realized through the direct transfer of feature information from the encoder and by standardizing the input structure of the decoder to accommodate various encoder architectures. Furthermore, the framework is extended and incorporated into an adaptable decoder for depth prediction and camera pose learning, employing standard loss functions. Comparative experiments with previous frameworks using the same encoder reveal that SELF achieves a 7% reduction in parameters while enhancing performance. Remarkably, substituting newly proposed algorithms in place of an encoder improves the outcomes as well as significantly decreases the number of parameters by 23%. The experimental findings highlight the ability of SELF to broaden depth factors, such as depth consistency. This framework facilitates the objective selection of algorithms as a backbone for extended research in monocular depth prediction. Full article
(This article belongs to the Special Issue Signal Processing and Machine Learning for Autonomous Vehicles)
Show Figures

Figure 1

26 pages, 8643 KiB  
Article
RNGC-VIWO: Robust Neural Gyroscope Calibration Aided Visual-Inertial-Wheel Odometry for Autonomous Vehicle
by Meixia Zhi, Chen Deng, Hongjuan Zhang, Hongqiong Tang, Jiao Wu and Bijun Li
Remote Sens. 2023, 15(17), 4292; https://doi.org/10.3390/rs15174292 - 31 Aug 2023
Cited by 2 | Viewed by 1299
Abstract
Accurate and robust localization using multi-modal sensors is crucial for autonomous driving applications. Although wheel encoder measurements can provide additional velocity information for visual-inertial odometry (VIO), the existing visual-inertial-wheel odometry (VIWO) still cannot avoid long-term drift caused by the low-precision attitude acquired by [...] Read more.
Accurate and robust localization using multi-modal sensors is crucial for autonomous driving applications. Although wheel encoder measurements can provide additional velocity information for visual-inertial odometry (VIO), the existing visual-inertial-wheel odometry (VIWO) still cannot avoid long-term drift caused by the low-precision attitude acquired by the gyroscope of a low-cost inertial measurement unit (IMU), especially in visually restricted scenes where the visual information cannot accurately correct for the IMU bias. In this work, leveraging the powerful data processing capability of deep learning, we propose a novel tightly coupled monocular visual-inertial-wheel odometry with neural gyroscope calibration (NGC) to obtain accurate, robust, and long-term localization for autonomous vehicles. First, to cure the drift of the gyroscope, we design a robust neural gyroscope calibration network for low-cost IMU gyroscope measurements (called NGC-Net). Following a carefully deduced mathematical calibration model, NGC-Net leverages the temporal convolutional network to extract different scale features from raw IMU measurements in the past and regress the gyroscope corrections to output the de-noised gyroscope. A series of experiments on public datasets show that our NGC-Net has better performance on gyroscope de-noising than learning methods and competes with state-of-the-art VIO methods. Moreover, based on the more accurate de-noised gyroscope, an effective strategy for combining the advantages of VIWO and NGC-Net outputs is proposed in a tightly coupled framework, which significantly improves the accuracy of the state-of-the-art VIO/VIWO methods. In long-term and large-scale urban environments, our RNGC-VIWO tracking system performs robustly, and experimental results demonstrate the superiority of our method in terms of robustness and accuracy. Full article
(This article belongs to the Special Issue Signal Processing and Machine Learning for Autonomous Vehicles)
Show Figures

Figure 1

20 pages, 8221 KiB  
Article
3D Point Cloud Object Detection Algorithm Based on Temporal Information Fusion and Uncertainty Estimation
by Guangda Xie, Yang Li, Yanping Wang, Ziyi Li and Hongquan Qu
Remote Sens. 2023, 15(12), 2986; https://doi.org/10.3390/rs15122986 - 8 Jun 2023
Cited by 1 | Viewed by 2067
Abstract
In autonomous driving, LiDAR (light detection and ranging) data are acquired over time. Most existing 3D object detection algorithms propose the object bounding box by processing each frame of data independently, which ignores the temporal sequence information. However, the temporal sequence information is [...] Read more.
In autonomous driving, LiDAR (light detection and ranging) data are acquired over time. Most existing 3D object detection algorithms propose the object bounding box by processing each frame of data independently, which ignores the temporal sequence information. However, the temporal sequence information is usually helpful to detect the object with missing shape information due to long distance or occlusion. To address this problem, we propose a temporal sequence information fusion 3D point cloud object detection algorithm based on the Ada-GRU (adaptive gated recurrent unit). In this method, the feature of each frame for the LiDAR point cloud is extracted through the backbone network and is fed to the Ada-GRU together with the hidden features of the previous frames. Compared to the traditional GRU, the Ada-GRU can adjust the gating mechanism adaptively during the training process by introducing the adaptive activation function. The Ada-GRU outputs the temporal sequence fusion features to predict the 3D object in the current frame and transmits the hidden features of the current frame to the next frame. At the same time, the label uncertainty of the distant and occluded objects affects the training effect of the model. For this problem, this paper proposes a probability distribution model of 3D bounding box coordinates based on the Gaussian distribution function and designs the corresponding bounding box loss function to enable the model to learn and estimate the uncertainty of the positioning of the bounding box coordinates, so as to remove the bounding box with large positioning uncertainty in the post-processing stage to reduce the false positive rate. Finally, the experiments show that the methods proposed in this paper improve the accuracy of the object detection without significantly increasing the complexity of the algorithm. Full article
(This article belongs to the Special Issue Signal Processing and Machine Learning for Autonomous Vehicles)
Show Figures

Figure 1

26 pages, 21365 KiB  
Article
Traffic Sign Detection and Recognition Using Multi-Frame Embedding of Video-Log Images
by Jian Xu, Yuchun Huang and Dakan Ying
Remote Sens. 2023, 15(12), 2959; https://doi.org/10.3390/rs15122959 - 6 Jun 2023
Cited by 3 | Viewed by 3193
Abstract
The detection and recognition of traffic signs is an essential component of intelligent vehicle perception systems, which use on-board cameras to sense traffic sign information. Unfortunately, issues such as long-tailed distribution, occlusion, and deformation greatly decrease the detector’s performance. In this research, YOLOv5 [...] Read more.
The detection and recognition of traffic signs is an essential component of intelligent vehicle perception systems, which use on-board cameras to sense traffic sign information. Unfortunately, issues such as long-tailed distribution, occlusion, and deformation greatly decrease the detector’s performance. In this research, YOLOv5 is used as a single classification detector for traffic sign localization. Afterwards, we propose a hierarchical classification model (HCM) for the specific classification, which significantly reduces the degree of imbalance between classes without changing the sample size. To cope with the shortcomings of a single image, a training-free multi-frame information integration module (MIM) was constructed, which can extract the detection sequence of traffic signs based on the embedding generated by the HCM. The extracted temporal detection information is used for the redefinition of categories and confidence. At last, this research performed detection and recognition of the full class on two publicly available datasets, TT100K and ONCE. Experimental results show that the HCM-improved YOLOv5 has a mAP of 79.0 in full classes, which exceeds that of state-of-the-art methods, and achieves an inference speed of 22.7 FPS. In addition, MIM further improves model performance by integrating multi-frame information while only slightly increasing computational resource consumption. Full article
(This article belongs to the Special Issue Signal Processing and Machine Learning for Autonomous Vehicles)
Show Figures

Figure 1

20 pages, 3302 KiB  
Article
Detector–Tracker Integration Framework for Autonomous Vehicles Pedestrian Tracking
by Huanhuan Wang, Lisheng Jin, Yang He, Zhen Huo, Guangqi Wang and Xinyu Sun
Remote Sens. 2023, 15(8), 2088; https://doi.org/10.3390/rs15082088 - 15 Apr 2023
Cited by 4 | Viewed by 2340
Abstract
Pedestrian tracking is an important aspect of autonomous vehicles environment perception in a vehicle running environment. The performance of the existing pedestrian tracking algorithms is limited by the complex traffic environment, the changeable appearance characteristics of pedestrians and the frequent occlusion interaction, which [...] Read more.
Pedestrian tracking is an important aspect of autonomous vehicles environment perception in a vehicle running environment. The performance of the existing pedestrian tracking algorithms is limited by the complex traffic environment, the changeable appearance characteristics of pedestrians and the frequent occlusion interaction, which leads to the insufficient accuracy and stability of tracking. Therefore, this paper proposes a detector–tracker integration framework for autonomous vehicle pedestrian tracking. Firstly, a pedestrian objects detector based on the improved YOLOv7 network was established. Space-to-Depth convolution layer was adopted to improve the backbone network of YOLOv7. Then, a novel appearance feature extraction network is proposed, which integrates the convolutional structural re-parameterization idea to construct a full-scale feature extraction block, which is the optimized DeepSORT tracker. Finally, experiments were carried out on MOT17 and MOT20 public datasets and driving video sequences, and the tracking performance of the proposed framework was evaluated by comparing it with the most advanced multi-object tracking algorithms. Quantitative analysis results show that the framework has high tracking accuracy. Compared with DeepSORT, MOTA improves by 2.3% in the MOT17 dataset and MOTA improves by 4.2% in the MOT20 dataset. Through qualitative evaluation on real driving video sequences, the framework proposed in this paper is robust in a variety of climate environments, and can be effectively applied to the pedestrian tracking of autonomous vehicles. Full article
(This article belongs to the Special Issue Signal Processing and Machine Learning for Autonomous Vehicles)
Show Figures

Graphical abstract

Back to TopTop