# **Advanced Sensing and Safety Control for Connected and Automated Vehicles**

Edited by

Chao Huang, Yafei Wang, Peng Hang, Zhiqiang Zuo and Bo Leng Printed Edition of the Special Issue Published in *Sensors*

www.mdpi.com/journal/sensors

## **Advanced Sensing and Safety Control for Connected and Automated Vehicles**

## **Advanced Sensing and Safety Control for Connected and Automated Vehicles**

Editors

**Chao Huang Yafei Wang Peng Hang Zhiqiang Zuo Bo Leng**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editors* Chao Huang The Hong Kong Polytechnical University Hong Kong China Zhiqiang Zuo Tianjin University Tianjin China

Yafei Wang Shanghai Jiao Tong University Shanghai China Bo Leng Tongji University Shanghai China

Peng Hang Tongji University Shanghai China

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Sensors* (ISSN 1424-8220) (available at: https://www.mdpi.com/journal/sensors/special issues/ CAV).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-7330-4 (Hbk) ISBN 978-3-0365-7331-1 (PDF)**

© 2023 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

## **Contents**



Reprinted from: *Sensors* **2023**, *23*, 4, doi:10.3390/s23010004 ...................... **217**

## **About the Editors**

#### **Chao Huang**

Dr. Huang Chao is currently a research assistant professor in the Department of Industrial and Systems Engineering at The Hong Kong Polytechnic University. She received a bachelor's degree in Control Engineering from the China University of Petroleum, Beijing, and a Ph.D. degree from the University of Wollongong, Australia. She has published 45 journal papers, 21 conference papers, and 2 books. Her number of citations is 1397 (h-index: 24, Google Scholar).

#### **Yafei Wang**

Yafei Wang (Member, IEEE) received his B.S. degree from Jilin University, Changchun; his M.S. degree from Shanghai Jiao Tong University, Shanghai, China; and his Ph.D. degree from The University of Tokyo, Tokyo, Japan. He is currently an associate professor at the School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai. His research interests include state estimation and control for connected and automated vehicles.

#### **Peng Hang**

Dr. Peng Hang received his Ph.D. degree from the School of Automotive Studies, Tongji University, Shanghai, China, in 2019. From 2020 to 2022, he served as a research fellow with the School of Mechanical and Aerospace Engineering, Nanyang Technological University, Singapore. He is currently a research professor at the Department of Traffic Engineering, Tongji University. His research interests include vehicle dynamics and control, decision making, and motion control of autonomous vehicles. He has been a visiting researcher with the Department of Electrical and Computer Engineering, National University of Singapore, and a software engineer in the Research & Advanced Technology Dept., SAIC Motor, China. He received many awards and honors, including Excellent Doctoral Thesis of Tongji University, APAC Excellent PaperAward, etc. He serves as the Topic Editor or Guest Editor for Games, Vehicles, Autonomous Intelligent Systems, and Actuators.

#### **Zhiqiang Zuo**

Prof. Zhiqiang Zuo received his M.S. degree in control theory and control engineering in 2001 from Yanshan University and his Ph.D. degree in control theory in 2004 from Peking University, China. In 2004, he joined the School of Electrical and Information Engineering, Tianjin University, where he is a full professor. From 2008 to 2010, he was a research fellow in the Department of Mathematics, City University of Hong Kong. From 2013 to 2014, he was a visiting scholar at the University of California, Riverside. His research interests include nonlinear control, robust control, and multi-agent systems. He serves as the Associate Editor of the Journal of the Franklin Institute (Elsevier).

#### **Bo Leng**

Dr. Leng received his B.S. and Ph.D. degrees in vehicle engineering from Tongji University, Shanghai, China, in 2014 and 2019, respectively. He is currently working as a postdoctoral researcher in both the School of Automotive Studies and the Postdoctoral Station of Mechanical Engineering at Tongji University. His research interests include vehicle dynamics and control, parameter estimation, and intelligent vehicle trajectory motion control. He won the first prize of the China Automobile Industry Technology Invention Award in 2020 and was selected for the 7th Young Talent Scholar Program of China Association for Science and Technology.

## **Preface to "Advanced Sensing and Safety Control for Connected and Automated Vehicles"**

This Special Issue on Sensors aims to report on some of the recent research efforts on this increasingly important topic. The 11 accepted papers in this Issue cover vehicle position estimation [1], 3D Object Detection [2], pedestrian state sense [3], trajectory prediction [4], criticality assessment [5], active fault-tolerant control for actuator failure [6], decision-making in the scenario of highway driving out of the ramp [7] and uncertain interactive traffic scenarios [8], RRT-based path planning algorithm [9], C/GMRES motion planning algorithm [10], and lane-keeping controller design [11]. In this introduction, a brief description of the content of each contribution forming the Special Issue is provided. Localization, or a vehicle's capacity to ascertain its position and orientation, is one of the crucial capacities needed to accomplish fully autonomous driving [1]. The first article describes a general, modular image processing pipeline that improves the robustness and accuracy of feature-based VO techniques, allowing for better pose estimation. Contrastlimited adaptive histogram equalization (CLAHE), square covering (SSC), and angle-based outliers rejection (AOR) are the three stages of the proposed pipeline. Each level deals with a different problem related to posing estimate errors. In order to improve the performance of any feature-based algorithm, the proposed pipeline is designed to be generic and modular so that it can be incorporated into any existing method. The proposed pipeline offers a large improvement in VO accuracy and resilience, with just a slight increase in processing time, according to the quantitative and qualitative results. Ref. [2] proposes an easy-to-use technique for up-sampling low-resolution point clouds, a process which improves the accuracy of 3D object detection by reconstructing objects rendered in sparse point cloud data into more dense data. First, 4-Chs is used to transform the 3D point cloud dataset into a 2D range image. In order to preserve the forms of the original object throughout the reconstruction, the interpolation on the empty space is calculated based on both the pixel distance and the range values of six neighbor points. In [3], a novel pedestrian status-detecting approach, based on multi-sensor fusion, is developed for automated patrol vehicles. First, the 2D and 3D pedestrian data are acquired through YOLO V4 and Euclidean clustering algorithms, respectively. A novel multi-sensor fusion method, based on the R-Tree algorithm, is then designed to detect the pedestrian and the body temperature is detected based on thermal imaging. The experimental results on an automated patrol vehicle indicated that the single-sensor algorithm's pedestrian detection results could be improved by more than 17% and that the multi-layer fusion method provided more understandable density estimation results.

A multi-head attention-based LSTM sequence-to-sequence model is designed in [4] to predict the future trajectories of the ego vehicle's surrounding vehicles. The social and temporal interaction are successfully extracted by the authors using a multi-head transformer method, and each one is then encoded into the input as hidden information using an LSTM encoder in the encoder module. The hidden information is decoded using the decoder module, which uses an additional multi-head attention layer to increase the accuracy of the subsequent decoding. The proposed model, according to ablative studies, resolves the cumulative error brought on by the transformer's autoregressive decoding behavior. The visualization outcomes demonstrate the model's robustness in challenging congested conditions. In [5], a novel risk assessment method, based on the time-to-react measurement, is proposed to determine overall criticality. The novelty of this method lies in the introduction of variable threshold values which are converted into the time domain from minimal safety distance metrics and acceleration values for criticality level determination. The designed variable criticality threshold closes a research gap by accounting for the kinematic relationships between vehicles and, as a result, the variable nature of criticality. The proposed method is proven to have the advantages of simplifying the consideration of the surrounding ego vehicles and evaluating possible evasive actions.

In [6], an active fault-tolerant control (AFTC) system is designed for the longitudinal motion control of autonomous mobile robots (AMRs) in the condition of braking failure of actuators. Specifically, a velocity-tracking controller is designed, with integrated feedforward and feedback control for normal longitudinal driving control. Once a key actuator failure is detected, the driving/braking torques of the remaining normal actuators are then reallocated based on the weighted least-squares (WLS) method for failure compensation. The simulation results show that the proposed AFTC algorithm can deal with the braking failure of IWMs and realize braking energy recovery at the same time. Ref. [7] proposes a generalized single-vehicle-based graph neural network reinforcement learning (SGRL) algorithm that incorporates interactive information between agents in the environment into the decision-making process of autonomous driving. The proposed SGRL algorithm uses the training approach for a single agent, constructs a clearer incentive reward function, and greatly increases the dimension of the action space over the conventional deep neural network (DQN) algorithm. The simulation results show that the proposed SGRL algorithm excels in terms of network convergence, decision-making impact, and training effectiveness.

Similar to [7], a reward function matrix for training different decision-making modes is provided in [8], with emphasis placed both on decision-making styles and, additionally, on rewards and penalties. A decision-weighted coefficient matrix, an incentive punishment weighted coefficient matrix, and a reward–penalty function matrix are all included in the proposed reward function matrix. The simulation results demonstrate that the proposed reward function can significantly increase the algorithm's stability and speed of convergence. Ref. [9] proposes using an improved heuristic Bi-RRT algorithm to obtain a smooth and asymptotically optimal path, with continuous curvature possessing high efficiency and accuracy in an uncertain dynamic environment. The consideration of the driver's driving habit and the obstacle-free direct connection mode of two trees, as well as the introduction of the greedy step size and the design of the path reorganization, can expand the node more effectively, make the path smooth, and ensure the ride comfort of the vehicle. The simulation results show that the proposed algorithm can generate the smoothest path and take the shortest time compared with the current studies.

In [10], a computationally efficient motion planning method that both considers traffic interactions and accelerates calculation is proposed. A nonlinear predictive controller is designed to dynamically generate a path by considering the predicted trajectories of other traffic participants. In addition, the C/GMRES algorithm is used to accelerate the calculation of trajectory generation. The simulation experiments show that the proposed path planning algorithm can enable greater rationality of movement planning. Ref. [11] proposes a cooperative control strategy for lane-keeping by integrating driving monitoring, a variable level of assistance allocation, and human-in-the-loop control. The relationship between lateral acceleration, road curvature, and the observed maximum driver torque is used in the first stage of this research to identify a time-varying physical driver loading pattern. An adaptive driver activity function is then developed to simulate the amount of support needed for the driver in the following stage. Additionally, this is combined with a monitored driver state that signals the driver's mental loading. A unique higher-order sliding mode controller is developed to maintain closed-loop stability in order to seamlessly switch control between different modes, based on the generated amounts of assistance. Additionally, the conflict is reduced by using a new sharing parameter that is proportionate to the torques originating from both the driver and the autonomous controller.

In summary, there is a huge potential for the application of CAV in areas of traffic perception, decision-making, and driving safety improvement. This Special Issue contributes to this by proposing solutions to the general problems of state estimation [1], objective detection [2,3], trajectory prediction [4], driving safety improvement [5,6], decision-making [7,8], path planning [9,10] and vehicle motion control [11], which largely represent the theoretical challenges and practical interest of this research topic. Finally, we wish to thank the authors, reviewers, and journal staff for their commitment and effort, without which we could not have completed this Special Issue on time. We hope that readers enjoy reading the articles and that the published works contribute to the progression of connected and automated vehicles.

#### **Chao Huang, Yafei Wang, Peng Hang, Zhiqiang Zuo, and Bo Leng** *Editors*

xi

## *Editorial* **Advanced Sensing and Safety Control for Connected and Automated Vehicles**

**Chao Huang 1,\*, Yafei Wang 2, Peng Hang 3, Zhiqiang Zuo <sup>4</sup> and Bo Leng <sup>5</sup>**


The connected and automated vehicle (CAV) is a promising technology, anticipated to enhance the safety and effectiveness of mobility. Advanced sensing technologies and control algorithms, working to acquire environmental data, analyze data, and regulate vehicle movements, are key functional components of CAVs. In recent years, the creation of innovative sensing technologies for CAVs has gained substantial attention. CAVs can now interpret sensory data to more accurately detect impediments, locate their locations, navigate autonomously in a dynamic environment, and communicate with other nearby vehicles. This has been made possible by advances in sensing technology. Additionally, by utilizing computer vision and other sensing techniques, in-cabin persons' bodily movements, facial expressions, and even mental states can be identified.

This Special Issue on *Sensors* aims to report on some of the recent research efforts on this increasingly important topic. The 11 accepted papers in this Issue cover vehicle position estimation [1], 3D Object Detection [2], pedestrian state sense [3], trajectory prediction [4], criticality assessment [5], active fault-tolerant control for actuator failure [6], decision-making in the scenario of highway driving out of the ramp [7] and uncertain interactive traffic scenarios [8], RRT-based path planning algorithm [9], C/GMRES motion planning algorithm [10], and lane-keeping controller design [11]. In this introduction, a brief description of the content of each contribution forming the Special Issue is provided.

Localization, or a vehicle's capacity to ascertain its position and orientation, is one of the crucial capacities needed to accomplish fully autonomous driving. [1] The first article describes a general, modular image processing pipeline that improves the robustness and accuracy of feature-based VO techniques, allowing for better pose estimation. Contrastlimited adaptive histogram equalization (CLAHE), square covering (SSC), and angle-based outliers rejection (AOR) are the three stages of the proposed pipeline. Each level deals with a different problem related to posing estimate errors. In order to improve the performance of any feature-based algorithm, the proposed pipeline is designed to be generic and modular so that it can be incorporated into any existing method. The proposed pipeline offers a large improvement in VO accuracy and resilience, with just a slight increase in processing time, according to the quantitative and qualitative results.

Ref. [2] proposes an easy-to-use technique for up-sampling low-resolution point clouds, a process which improves the accuracy of 3D object detection by reconstructing objects rendered in sparse point cloud data into more dense data. First, 4-Chs is used to transform the 3D point cloud dataset into a 2D range image. In order to preserve the forms of the original object throughout the reconstruction, the interpolation on the empty space is calculated based on both the pixel distance and the range values of six neighbor points.

In [3], a novel pedestrian status-detecting approach, based on multi-sensor fusion, is developed for automated patrol vehicles. First, the 2D and 3D pedestrian data are acquired

**Citation:** Huang, C.; Wang, Y.; Hang, P.; Zuo, Z.; Leng, B. Advanced Sensing and Safety Control for Connected and Automated Vehicles. *Sensors* **2023**, *23*, 1037. https:// doi.org/10.3390/s23021037

Received: 9 January 2023 Accepted: 11 January 2023 Published: 16 January 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

through YOLO V4 and Euclidean clustering algorithms, respectively. A novel multi-sensor fusion method, based on the R-Tree algorithm, is then designed to detect the pedestrian and the body temperature is detected based on thermal imaging. The experimental results on an automated patrol vehicle indicated that the single-sensor algorithm's pedestrian detection results could be improved by more than 17% and that the multi-layer fusion method provided more understandable density estimation results.

A multi-head attention-based LSTM sequence-to-sequence model is designed in [4] to predict the future trajectories of the ego vehicle's surrounding vehicles. The social and temporal interaction are successfully extracted by the authors using a multi-head transformer method, and each one is then encoded into the input as hidden information using an LSTM encoder in the encoder module. The hidden information is decoded using the decoder module, which uses an additional multi-head attention layer to increase the accuracy of the subsequent decoding. The proposed model, according to ablative studies, resolves the cumulative error brought on by the transformer's autoregressive decoding behavior. The visualization outcomes demonstrate the model's robustness in challenging congested conditions.

In [5], a novel risk assessment method, based on the time-to-react measurement, is proposed to determine overall criticality. The novelty of this method lies in the introduction of variable threshold values which are converted into the time domain from minimal safety distance metrics and acceleration values for criticality level determination. The designed variable criticality threshold closes a research gap by accounting for the kinematic relationships between vehicles and, as a result, the variable nature of criticality. The proposed method is proven to have the advantages of simplifying the consideration of the surrounding ego vehicles and evaluating possible evasive actions.

In [6], an active fault-tolerant control (AFTC) system is designed for the longitudinal motion control of autonomous mobile robots (AMRs) in the condition of braking failure of actuators. Specifically, a velocity-tracking controller is designed, with integrated feedforward and feedback control for normal longitudinal driving control. Once a key actuator failure is detected, the driving/braking torques of the remaining normal actuators are then reallocated based on the weighted least-squares (WLS) method for failure compensation. The simulation results show that the proposed AFTC algorithm can deal with the braking failure of IWMs and realize braking energy recovery at the same time.

Ref. [7] proposes a generalized single-vehicle-based graph neural network reinforcement learning (SGRL) algorithm that incorporates interactive information between agents in the environment into the decision-making process of autonomous driving. The proposed SGRL algorithm uses the training approach for a single agent, constructs a clearer incentive reward function, and greatly increases the dimension of the action space over the conventional deep neural network (DQN) algorithm. The simulation results show that the proposed SGRL algorithm excels in terms of network convergence, decision-making impact, and training effectiveness.

Similar to [7], a reward function matrix for training different decision-making modes is provided in [8], with emphasis placed both on decision-making styles and, additionally, on rewards and penalties. A decision-weighted coefficient matrix, an incentive–punishmentweightedcoefficient matrix, and a reward–penalty function matrix are all included in the proposed reward function matrix. The simulation results demonstrate that the proposed reward function can significantly increase the algorithm's stability and speed of convergence.

Ref. [9] proposes using an improved heuristic Bi-RRT algorithm to obtain a smooth and asymptotically optimal path, with continuous curvature possessing high efficiency and accuracy in an uncertain dynamic environment. The consideration of the driver's driving habit and the obstacle-free direct connection mode of two trees, as well as the introduction of the greedy step size and the design of the path reorganization, can expand the node more effectively, make the path smooth, and ensure the ride comfort of the vehicle. The simulation results show that the proposed algorithm can generate the smoothest path and take the shortest time compared with the current studies.

In [10], a computationally efficient motion planning method that both considers traffic interactions and accelerates calculation is proposed. A nonlinear predictive controller is designed to dynamically generate a path by considering the predicted trajectories of other traffic participants. In addition, the C/GMRES algorithm is used to accelerate the calculation of trajectory generation. The simulation experiments show that the proposed path planning algorithm can enable greater rationality of movement planning.

Ref. [11] proposes a cooperative control strategy for lane-keeping by integrating driving monitoring, a variable level of assistance allocation, and human-in-the-loop control. The relationship between lateral acceleration, road curvature, and the observed maximum driver torque is used in the first stage of this research to identify a time-varying physical driver loading pattern. An adaptive driver activity function is then developed to simulate the amount of support needed for the driver in the following stage. Additionally, this is combined with a monitored driver state that signals the driver's mental loading. A unique higher-order sliding mode controller is developed to maintain closed-loop stability in order to seamlessly switch control between different modes, based on the generated amounts of assistance. Additionally, the conflict is reduced by using a new sharing parameter that is proportionate to the torques originating from both the driver and the autonomous controller.

In summary, there is a huge potential for the application of CAV in areas of traffic perception, decision-making, and driving safety improvement. This Special Issue contributes to this by proposing solutions to the general problems of state estimation [1], objective detection [2,3], trajectory prediction [4], driving safety improvement [5,6], decision-making [7,8], path planning [9,10] and vehicle motion control [11], which largely represent the theoretical challenges and practical interest of this research topic. Finally, we wish to thank the authors, reviewers, and journal staff for their commitment and effort, without which we could not have completed this Special Issue on time. We hope that readers enjoy reading the articles and that the published works contribute to the progression of connected and automated vehicles.

**Funding:** This work received funding from the PolyU (UGC), via grant A0040253 associated with grant A0039179.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

## *Article* **A Generic Image Processing Pipeline for Enhancing Accuracy and Robustness of Visual Odometry**

**Mohamed Sabry 1,\*, Mostafa Osman 2, Ahmed Hussein 3, Mohamed W. Mehrez 2, Soo Jeon <sup>2</sup> and William Melek <sup>2</sup>**


<sup>3</sup> Intelligent Driving Function Department, IAV GmbH, 10587 Berlin, Germany

**\*** Correspondence: mohamed.sabry@alumnos.uc3m.es

**Abstract:** The accuracy of pose estimation from feature-based Visual Odometry (VO) algorithms is affected by several factors such as lighting conditions and outliers in the matched features. In this paper, a generic image processing pipeline is proposed to enhance the accuracy and robustness of feature-based VO algorithms. The pipeline consists of three stages, each addressing a problem that affects the performance of VO algorithms. The first stage tackles the lighting condition problem, where a filter called Contrast Limited Adaptive Histogram Equalization (CLAHE) is applied to the images to overcome changes in lighting in the environment. The second stage uses the Suppression via Square Covering (SSC) algorithm to ensure the features are distributed properly over the images. The last stage proposes a novel outliers rejection approach called the Angle-based Outlier Rejection (AOR) algorithm to remove the outliers generated in the feature matching process. The proposed pipeline is generic and modular and can be integrated with any type of feature-based VO (monocular, RGB-D, or stereo). The efficiency of the proposed pipeline is validated using sequences from KITTI (for stereo VO) and TUM (for RGB-D VO) datasets, as well as experimental sequences using an omnidirectional mobile robot (for monocular VO). The obtained results showed the performance gained by enhancing the accuracy and robustness of the VO algorithms without compromising on the computational cost using the proposed pipeline. The results are substantially better as opposed to not using the pipeline.

**Keywords:** visual odometry; image processing pipeline; computer vision; robot operating system (ROS)

#### **1. Introduction**

Throughout the past few years, interest in autonomous robotic systems has increased drastically. One of the key modules required to achieve complete autonomy is localization, which is the ability of a mobile platform to determine its position and orientation. Currently, several sensors are used to achieve localization, including LiDAR [1], radar [2], Global Positioning System (GPS) [3], Inertial Measurement Unit (IMU) [4], wheel encoders [5], and cameras [6].

One of the common methods for localization using cameras is through Visual Odometry (VO) [7,8]. VO estimates the ego-motion of a camera by determining the incremental motion between the successive camera frames. Like other odometry methods (wheel encoders, LiDAR, and so on), VO relies on integrating the incremental motion between successive frames to compute the overall trajectory of the camera, leading to drift errors over long distances.

To avoid drift errors, VO is usually integrated into a Simultaneous Localization and Mapping (SLAM) system with a loop-closure module to correct the drift error [9–11]. Although using SLAM is beneficial for acquiring a map of the environment, it incurs

**Citation:** Sabry, M.; Osman, M.; Hussein, A.; Mehrez, M.W.; Jeon, S.; Melek, W. A Generic Image Processing Pipeline for Enhancing Accuracy and Robustness of Visual Odometry. *Sensors* **2022**, *22*, 8967. https://doi.org/10.3390/s22228967

Academic Editors: Yafei Wang, Chao Huang, Peng Hang, Zhiqiang Zuo and Bo Leng

Received: 27 October 2022 Accepted: 14 November 2022 Published: 19 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

unnecessary computational costs when the map is already known. A better solution could be using an accurate low-drift odometry module alongside localization in a pre-acquired map [12].

Several VO algorithms were developed in the literature aiming to increase pose estimation accuracy. Those algorithms are classified by the type of camera used in the motion estimation as monocular, stereo, and RGB-D VO. Alternately, they can also be classified by the method of motion estimation into feature-based and direct VO [13].

Monocular VO uses images captured by only one camera to determine its trajectory. This technique usually relies on Structure from Motion (SFM) [14–17]. With one camera, the motion of the robot can be captured up to an unobservable scale. This scale can then be determined through the use of an external velocity measurement from a wheel encoder or an IMU. Several researchers developed methods for estimating the scale of monocular VO without the use of external sensors, such as [18,19].

Lately, researchers have started developing deep-learning techniques for monocular VO [20,21]. Unlike the monocular VO, the stereo and the RGB-D VO estimate the full pose of the vehicle without any external measurements [22–24].

All three categories of VO can be either direct or feature-based approaches. On the one hand, the feature-based method relies on visual features for calculating the transformation between consecutive frames. For the detection of such features, a feature detector and descriptor are used such as Scale Invariant Features Transform (SIFT) [25], Speeded-up Robust Features (SURF) [26], or Oriented FAST and Rotated BRIEF (ORB) [27] (see [28] or [29] for a comparison between the different detectors and descriptors). On the other hand, direct approaches compute the relative transformation between frames based on the whole-image intensities [30,31].

Although such VO techniques are well-designed, VO algorithms have a drawback in that they rely on integration, which may suffer from drift errors due to, for example, false-matched features, bad lighting and illumination problems, random noise, or motion bias [32].

To overcome the above-mentioned issues, this paper proposes a generic modular image processing pipeline to enhance the accuracy and robustness of feature-based VO algorithms, independent of the camera type (monocular, RGB-D or stereo vision). The proposed pipeline includes additional filtration and pre-processing stages through Contrast Limited Adaptive Histogram Equalization (CLAHE) with adaptive thresholding to dynamically overcome variable lighting conditions. Additionally, the Suppression via Square Covering (SSC) is used to make the extracted features more equally distributed across the image to reduce the bias in the motion estimation [33]. Finally, a novel outlier rejection algorithm called the Angle-based Outliers Rejection (AOR) is proposed to reject false-matched features, as well as features captured on a moving object in the scene. The pipeline is integrated into a monocular, RGB-D, and stereo VO and validated using KITTI [34] and TUM datasets [35], as well as experimental sequences generated by an omnidirectional mobile robot.

The contribution of this paper is integrating the CLAHE filter, the SSC algorithm, and the proposed AOR algorithm into the VO. The results show that the three stages play an integral part in enhancing the performance of VO while overcoming the drawbacks of every individual stage.

The remainder of this paper is organized as follows, Section 2 discusses the related work, and Section 3 explains the proposed pipeline with the different filtration steps. The experimental work is presented in Section 4, which also includes the implementation details, along with the used datasets and the evaluation metrics. Section 5 presents the results and discussions. Finally, Section 6 provides concluding remarks and future works.

#### **2. Related Works**

#### *2.1. Image Filtration*

Several works have addressed the problem of noise in images for VO enhancement. The noise may be attributed to poor lighting conditions caused by light source flare, random visual sensor noise, or other noise sources [36].

In [37], a direct VO algorithm using binary descriptors was used to overcome poor lighting conditions. The authors showed that the algorithm performed in a robust and efficient way even under low lighting conditions. This was accomplished by the illumination invariance property of the binary descriptor within a direct alignment framework. The VO algorithm proposed therein is a direct method, which is usually more computationally expensive compared to feature-based VO.

In [38], a method for reducing drift in VO is introduced. The authors develop a new descriptor called the SYnthetic BAsis (SYBA) descriptor to reduce false-matched features. This is accomplished with the help of a sliding window approach. The features matching step is applied to features in a window instead of matching features between two consecutive frames. Although a sliding window approach can, in fact, increase the accuracy of the feature matching step, it will also significantly increase the computational cost of the matching task.

In [39], a robust feature-matching scheme was combined with an effective anti-blurring frame. The algorithm uses the singular value decomposition to mitigate the effect of blurring due to vibrations or other factors.

In [40], a stereo visual SLAM algorithm was proposed, which uses CLAHE to locally enhance the image contrast and obtain more feature details. The CLAHE-enhanced SLAM algorithm was compared to the results of a VO enhanced by a conventional histogram equalization and the results of ORB-SLAM2 [11]. The results showed a superior performance of the CLAHE-enhanced algorithm compared to the other algorithms. Furthermore, in [41], a robust VO for underwater environments was proposed. In order to overcome the turbid image quality of underwater imaging, the authors used CLAHE for contrast enhancement. The authors showed that the use of CLAHE resulted in brighter and larger visible regions. As a result, unclear structures were significantly reduced.

Therefore, in this paper, CLAHE is selected as a pre-processing stage for the camera frames to overcome the effect of poor lighting conditions.

#### *2.2. Non-Maximal Suppression*

Non-maximal suppression can be used to avoid poor distribution of features over the image, which leads to poor VO performance and motion bias. Several non-maximal suppression algorithms were used in VO [42,43]. In [44], a feature descriptor was proposed to facilitate fast feature matching processing while preserving matching reliability. The authors chose to use the FAST (Features from Accelerated Segment Test) detector [45] along with a non-maximal suppression algorithm.

In [46], a stereo/RGB-D VO was proposed for mobile robots. Therein, the authors used the adaptive non-maximal suppression introduced in [47] to enhance the performance of the feature detector algorithm BRIEF (Binary Robust Independent Elementary Features) [48] by ensuring uniform distribution of features over the image.

In [33], three new and efficient adaptive non-maximal suppression approaches were introduced, which included the SSC algorithm. The positive impact of the three algorithms on visual SLAM was demonstrated. Authors in [33] showed that the output of the three algorithms is visually and statistically similar; however, SSC showed lower computational costs, which suggests that it is more suitable for real-time applications such as VO.

Although the authors of [33] showed the effect of the SSC on the enhancement of the output of a visual SLAM algorithm, to the best of the authors' knowledge, the SSC algorithm was not used in any other VO or visual SLAM algorithm afterward. In this paper, SSC is selected as an additional stage for feature detection and matching to avoid the bias in the motion estimation due to poor distribution of the features.

#### *2.3. Outliers Rejection*

Feature-based VO relies on feature detection and matching for motion estimation. Commonly, the feature matching algorithms generate a considerable amount of falsematched features [7]. This is mainly due to the limitation of using local feature matching. These false-matched features lead to the increased error in motion estimation or the complete divergence of the VO output, as well as increased computational costs. Several works in the literature addressed this problem. In [49], an iterative outlier rejection scheme for stereo-based VO was proposed. The proposed algorithm was designed to improve the VO motion estimation for high-speed and large-scale depth environments.

In [50], a stereo VO was proposed that relies on using reference frames instead of all frames. This was accomplished by first selecting the stable features from a frame using quad-matching testing and grid-based motion statistics. Afterward, the features in this frame were matched to the features in a reference frame (instead of the previous frame), which contained the stable features found in the current frame.

A commonly used outlier rejection approach is the Random Sampling Consensus (RANSAC). RANSAC is an iterative outlier rejection algorithm, which relies on the computation of model hypotheses, from a randomly selected set of the matched features, followed by the verification of the hypotheses using the rest of the matched features [8]. In [51], a stereo VO algorithm was proposed, which uses a RANSAC-based outliers rejection along with an iterated sigma point Kalman filter to achieve robust frame-to-frame VO performance. Although RANSAC is effective in removing outliers, the iterative process sometimes results in poor performance due to a large number of iterations for convergence. Furthermore, if the number of outliers in the matched points is large, this may lead to wrong convergence entailing incorrect motion estimation.

Hence, in this paper, a non-iterative outliers rejection algorithm is proposed, which relies on the angular distance of the matched features in the matched frames. The algorithm can be incorporated before RANSAC in order to reduce the number of iterations required for convergence and thus increase the overall accuracy of motion estimation while reducing the computational cost of the algorithm.

#### **3. Proposed Pipeline**

In this section, the components of the proposed image processing pipeline are introduced. The flow chart of the pipeline is shown in Figure 1.

**Figure 1.** Proposed pipeline for the robust feature-based VO with the added filtration stages highlighted in red.

#### *3.1. Image Pre-Processing*

The pre-processing stage consists of applying a simple blurring filter to remove some noise in the image, followed by applying an Adaptive Histogram Equalization (AHE) technique, namely CLAHE [52]. CLAHE is applied to each input frame to enable the feature detector to find a sufficient number of features per frame. Although traditional AHE techniques tend to over-amplify the noise in the nearly constant regions in an image, the CLAHE filter prevents this over-amplification by limiting the histogram values. The effect of the CLAHE is shown in Figure 2.

Image after applying CLAHE

**Figure 2.** The effect of CLAHE filter on the image. Top: the image before applying CLAHE, Bottom: the image after applying the CLAHE algorithm with the adaptive thresholding. The image was taken from the KITTI dataset.

Moreover, to ensure that the CLAHE filter adapts to different lighting conditions, the threshold value of the CLAHE is made adaptive to the ratio between the minimum, maximum, and the median of the intensity values in the frame, as presented in (1). The adaptation of the threshold value enables the CLAHE filter to adapt to different lighting conditions during the mobile platform operation and to avoid deterioration of the VO performance caused by too high or too low brightness in the images. Specifically, the contrast at which the CLAHE filter clips the histogram is computed as

$$\tau\_k = \frac{\max(I\_k) - \min(I\_k)}{\text{median}(I\_k)} \tag{1}$$

where *τ<sup>k</sup>* and *Ik* are the contrast threshold for the CLAHE filter and the 2D image data at the *k*-th time step, respectively.

An example of the output of the CLAHE filter is shown in Figure 2. The effect of the sun can be seen in the original image, which leads to bright regions in the top middle of the image and dark regions on the left and the right of the image. Applying CLAHE results in decreasing the effect of the sunlight on the image and increasing the amount of extractable information, which can then be used by the feature detector algorithm to capture more stable features from the image.

#### *3.2. Features Detection and Matching*

After the image pre-processing, the current frame *Ik* is passed to a feature detector. The extracted set of features, denoted by F*k*, is then matched with the set of those from the reference frame F*k*−1. Then, the set of matched features P*k*−1:*<sup>k</sup>* is used for estimating the incremental motion between the two frames *Ik*−<sup>1</sup> and *Ik*.

One of the causes of error in the motion estimation is the nonuniform distribution of features associated with the image [32,46]. Another cause of poor motion estimation is the presence of a high number of outliers in the detected and matched features. Therefore, in this paper, we added the SSC as well as the proposed AOR steps to the proposed pipeline.

#### 3.2.1. Suppression via Square Covering

Before passing the feature set F*<sup>k</sup>* to the feature matching algorithm, the features are first passed to the SSC [33] algorithm to make sure the captured features are homogeneously distributed over the whole captured image *Ik*.

The SSC algorithm is an approximation of the Suppression via Disk Covering (SDC). It relies on an approximate nearest neighbor algorithm, which uses a randomized search tree. In contrast, the SSC achieves comparable results with a single query operation per search range guess. Accordingly, the SSC has better efficiency and scalability than the SDC. In addition, SSC applies a square approximation for the SDC to avoid computing the Euclidean distance between a large number of features. This allows the SSC algorithm to execute in a runtime with lower complexity as the number of features increases.

The effect of using the SSC algorithm is shown in Figure 3. Figure 3a shows the original output of the SURF feature detector, where the feature density is higher in the top right region of the image. As shown in Figure 3b, after applying the SSC, the feature distribution across the image is almost the same (except for regions that did not contain any features).

**Figure 3.** The effect of SSC and AOR on the features detection and matching. The previous frame is shown as the red component of *Ik*−1, and the current frame is shown as the blue component of *Ik*. The green crosses (+) and the blue circles (◦) represent the features F*k*−<sup>1</sup> and F*k*, respectively. Finally, the red lines represent the matched pairs in P*k*−1:*k*. Each of the images represents the following: (**a**) the original feature pairs through using the SURF detector and a brute-force matching algorithm, (**b**) shows the feature pairs after adding the SSC algorithm only. (**c**) shows the feature pairs after adding the AOR algorithm only, and finally (**d**) shows the feature pairs after adding both SSC and AOR. The majority of the outliers were removed due to the large difference between the angles, as illustrated in Figure 4.

Although several feature matching algorithms were introduced in the literature [53], those algorithms generate a considerable amount of false-matched points (as can be seen in Figure 3a,b. Motivated by this issue, in this paper, a novel outlier rejection algorithm is introduced and integrated into the overall VO pipeline to remove those false-matched points.

#### 3.2.2. Angle-Based Outliers Rejection (AOR) for Feature Matching

To filter the produced matched points from outliers, a new outlier rejection algorithm called the AOR is proposed. In addition to removing false-matched features, the AOR can remove features that do not belong to the ego vehicle motion in the scene. This is applied before the motion estimation stage (RANSAC/LMEDS) in order to reduce the number of iterations required for convergence and thus increase the overall accuracy of motion estimation while reducing the computational cost of the algorithm.

Usually, during the motion of the camera, the further the feature from the vanishing point of the image, the more the feature moves. Although the amount of movement of a feature in the successive images is different depending on its position, it should be comparable to the motion of other features. False-matched points tend to show a larger amount of feature motion through the image, as shown in Figure 3a. The AOR uses the distance traveled by the feature in the successive images along with the actual position of the feature in those images to remove false-matched points, as illustrated in Figure 4.

**Figure 4.** Visual illustration of *θc* and *θp* for three different feature pairs in an image.

Figure 4 illustrates the two metrics used by the proposed AOR. AOR can be divided into two steps. First, the angle *θ<sup>c</sup>* between the lines drawn from the center of the image to the feature (shown in Figure 4) in *Ik*−<sup>1</sup> and *Ik* is simply calculated as [54].

$$\theta\_{c,i} = \arccos \frac{\mathbf{x}\_{k-1,i} \cdot \mathbf{x}\_{k,i} + y\_{k-1,i} \cdot \mathbf{y}\_{k,i}}{\sqrt{\mathbf{x}\_{k-1,i}^2 + y\_{k-1,i}^2} \cdot \sqrt{\mathbf{x}\_{k,i}^2 + y\_{k,i}^2}} \,\tag{2}$$

where (*xk*−1,*i*, *yk*−1,*i*) and (*xk*,*i*, *yk*,*i*) are the coordinates of the *<sup>i</sup>*-th feature in the frames *Ik*−<sup>1</sup> and *Ik*, with respect to the center of the image, respectively. Notice that *θ<sup>c</sup>* represents the amount of feature motion, irrespective of its position in the frame. *θ<sup>c</sup>* for different feature positions is illustrated in the top plot of Figure 4.

Second, the Euclidean distance traveled by the feature projected on a reference circle, as shown in the bottom plot of Figure 4, and the corresponding angle *θ<sup>p</sup>* is calculated as [55]:

$$E\_i := \left\| \begin{bmatrix} \mathbf{x}\_{k,i} \\ \mathbf{y}\_{k,i} \end{bmatrix} - \begin{bmatrix} \mathbf{x}\_{k-1,i} \\ \mathbf{y}\_{k-1,i} \end{bmatrix} \right\|\_2 \tag{3}$$

$$
\theta\_{p,i} = \frac{E\_i}{R} \tag{4}
$$

where *R* is the radius of the reference circle. Notice that *R* determines the sensitivity of the values of *θp*. The larger the radius, the smaller the angle for larger Euclidean distances. The radius can be calculated as,

$$R = \sqrt{\frac{c\_x^2 + c\_y^2}{\zeta}},\tag{5}$$

where *cx* and *cy* are the centers of the image, and *ζ* is a parameter that controls the size of the radius. During the experimentation, the best results were obtained by *ζ* = 8; however, different values may work better for different conditions. Notice that here we assume that the vanishing point is in the center of the image. Although this is not generally true, in the case of a ground vehicle motion, such an assumption did not affect the results acquired by the algorithm. Furthermore, through testing the algorithm with a vanishing point extraction algorithm [56], the output was similar in accuracy. Therefore, we chose to omit such a step to achieve faster VO pipeline.

Using the two angles *θ<sup>c</sup>* and *θp*, a score *S* is computed for each feature as:

$$S\_{\bar{i}} = |\theta\_{\mathfrak{c},i}\theta\_{p,\bar{i}}(\theta\_{\mathfrak{c},\bar{i}} - \theta\_{p,\bar{i}})|.\tag{6}$$

Notice that for every matched feature, the greater *θ<sup>p</sup>* and *θ<sup>c</sup>* values and the difference between them, the larger the score AOR yields.

Finally, a feature is selected as an inlier, if its AOR score value is less than a threshold *η* calculated as

$$
\eta = \mathfrak{c} \cdot \text{median}\left(\mathcal{S}\right),
\tag{7}
$$

where S is the score set of the matched features, and *c* > 1 is a parameter, which is set in this paper to 2 (through tuning). The overall AOR algorithm is summarized in Algorithm 1.

Figure 3c shows the effect of AOR on the detected features. By using AOR, all the outliers, which are present in Figure 3a, are removed, and only the true features describing the motion of the camera remain. Furthermore, notice the effect of the AOR algorithm in removing the features detected on the moving vehicle present in the image since the motion of such features does not agree with the motion of the remaining features. Figure 3d shows the effect of both SSC and AOR on the image, where the inliers remaining in the image are better distributed due to the SSC effect.


The filtered matched feature set <sup>P</sup><sup>ˆ</sup> *<sup>k</sup>*−1:*<sup>k</sup>* can then be passed to any VO algorithm to estimate the incremental motion of the camera and to compute the odometry.

#### **4. Experimental Work**

To show the generic aspect of the proposed pipeline, simple stereo, RGB-D, and monocular VO algorithms are implemented for validation. The motion estimation techniques used are the same as those described in [7].

The algorithms are implemented in Python using OpenCV library, SURF for feature tracking, and the extracted features are matched between consecutive frames by Brute-Force matching. All experiments and tests were conducted on a computer with an Intel i7-8850H 6-core processor running at 2.60 GHz using 16 GB of RAM, running Ubuntu 16.04. Furthermore, the algorithms were implemented with a Robot Operating System (ROS) wrapper node to be compatible with the ROS framework [57].

The VO algorithms were then used to estimate the motion of the camera using sequences from KITTI [34], TUM [35], as well as experimental sequences generated by Summit-XL Steel manufactured by Robotnik Inc. [58]. The performance of the pipeline is evaluated through several comparisons, which demonstrate the effect of the added stages to the VO pipeline.

#### *4.1. Motion Estimation*

#### 4.1.1. Stereo/RGB-D Visual Odometry

The stereo and RGB-D VO algorithms used in this paper rely on solving the same 3D-to-2D correspondence problem. First, the features in the image *Ik*−<sup>1</sup> along with the disparity map or the depth image are used to produce the 3D features *Fk* in *Ik*−1. The motion of the camera is then estimated by solving the Perspective-n-Point (PnP) problem. The PnP problem is solved in a RANSAC scheme [8]. This is after utilizing the AOR to achieve better and more efficient motion estimation.

$$T\_{k-1:k} = \underset{T\_{k-1:k}}{\text{arg min}} \sum\_{i=0}^{N\_f} \left\| f\_k^i - f\_{k-1}^i \right\|\_2^2 \tag{8}$$

*Tk*−1:*<sup>k</sup>* ∈ *SE*(3) is the transformation matrix describing the incremental motion between time-steps *<sup>k</sup>* <sup>−</sup> 1 and *<sup>k</sup>*, *<sup>f</sup> <sup>i</sup> <sup>k</sup>* is *<sup>i</sup>*-th 2D feature in the current image, *<sup>F</sup>*ˆ*<sup>i</sup> <sup>k</sup>*−<sup>1</sup> is the same feature in 3D, reprojected from image *Ik*−<sup>1</sup> onto the current image *Ik* through *Tk*−1:*k*, and *Nf* is the total number of features in the image.

#### 4.1.2. Monocular Visual Odometry

To estimate the motion from a single camera, the Epipolar constraint between frames is used as

$$\mathbf{E}\begin{bmatrix} \mathbf{x}\_{k-1,i} & \mathbf{y}\_{k-1,i} \end{bmatrix} \mathbf{E}\begin{bmatrix} \mathbf{x}\_{k,i} \\ \mathbf{y}\_{k,i} \end{bmatrix} = \mathbf{0}, \ \forall \ 0 \le i \le N\_f \tag{9}$$

where **<sup>E</sup>** <sup>∈</sup> <sup>R</sup>3×<sup>3</sup> is the essential matrix for the calibrated camera [16].

The essential matrix **E** is estimated using the five-point algorithm proposed in [59]. After obtaining the essential matrix, it is decomposed into the translation and rotation of the camera as described in [16]. Furthermore, the motion estimation algorithm is executed with the Least Median Of Squares (LMEDS) [60] scheme in order to achieve better motion estimation.

Using a monocular camera, the ego-motion of the camera can be estimated up to a scale. To compensate for this scale, a velocity measurement of the vehicle needs to be available through the use of an external sensor such as wheel encoders, an IMU, a GPS, or through the CAN data from the vehicle's tachometer.

The implementation of the VO algorithms from scratch was intended for the ease of integration of the proposed pipeline. However, the pipeline, in general, can be integrated to any VO implementation.

#### *4.2. Datasets*

#### 4.2.1. KITTI Vision Benchmark Dataset

KITTI Vision Benchmark Suite was selected as a publicly available dataset [34]. The dataset provides ground-truth ego-motion for 11 training sequences and 11 test sequences. The ground-truth is provided as a list of 6D poses for the training sequences, whereas for the test sequences, evaluation results are obtained by submitting them to the KITTI website. The dataset is sampled at 10 Hz at an average speed of 90 km/h, which creates a challenge in using the dataset for training and testing. Sequence 3 from the training subset is no longer available, as it was removed by KITTI for its similarities with the test sequences.

The dataset comprises the following information: raw synced and rectified color images from the left and right cameras and raw 3D GPS/IMU unfiltered data, along with the timestamps for all recordings. In order to convert the raw data to ROS bagfiles, the *kitti*2*bag* package was used [61]. The dataset also provides a tool for evaluating the performance of the VO and visual SLAM algorithms. This tool was used in the paper to evaluate the proposed pipeline in the case of the KITTI dataset.

#### 4.2.2. TUM RGB-D Dataset

The TUM RGB-D dataset is a large dataset containing sequences captured by an RGB-D camera along with its ground-truth to establish a benchmark for evaluation of VO and visual SLAM algorithms [35]. The dataset contains the color and depth images taken by a Microsoft Kinect camera, while the ground truth was recorded using a highaccuracy motion capture system with eight high-speed tracking cameras (100 Hz). The data were recorded using a 30 Hz rate with a camera resolution of 640 × 460. The dataset also provides an online tool through which the results are submitted for evaluating the performance of VO and visual SLAM systems. In this paper, the TUM sequences are evaluated using the Relative Pose Error (RPE), which is recommended by the dataset for VO algorithms [62].

RPE is basically the error in relative motion between the pairs of the VO output. The evaluation tool by the TUM dataset computes the error between all pairs of the output and generates the evaluation metrics such as the Root Mean Square Error (RMSE), mean, max, etc. In this paper, the RMSE error for the translation and orientation is used for evaluation.

#### 4.2.3. Images from Omnidirectional Robot

Summit XLS is a ground mobile robot with mecanum wheels, shown in Figure 5. The robot is equipped with an Astra RGB-D Camera (https://shop.orbbec3d.com/Astra, accessed on 27 October 2022), as well as wheel encoders. Several experiments were made using the robot to validate the proposed pipeline, while using the VICON motion capture system (https://www.vicon.com/, accessed on 27 October 2022) as a reference. The VICON system used consists of 12 cameras and the VICON bridge package was used to couple VICON with ROS [63]. Since the RGB-D VO case is tested and validated using the TUM dataset, the Summit-XL Steel sequences are used to validate the monocular VO while relying on the wheel encoders to obtain the speed for motion scaling. The evaluation is again conducted using the RMSE error for translation and orientation.

Three different sequences were executed using the robot in remote-control mode. In the first two sequences, the robot moved in semi-rectangular paths while, in the third sequence, the robot moved in a circular path. The total length of each of the paths was 12.5 m in the case of the rectangular paths and 6.5 m in the case of the circular path.

**Figure 5.** The omni-directional mobile robot used to validate the monocular VO algorithm with the proposed pipeline.

#### **5. Results and Discussion**

#### *5.1. KITTI Vision Benchmark Dataset/Stereo VO*

#### 5.1.1. Pose Accuracy Comparison

In order to show the efficacy of the proposed algorithm, the pose estimation results from a stereo VO are reported with and without the proposed pipeline. Table 1 shows the accuracy comparison using the 10 sequences available from the KITTI dataset. The results shown are the translation and rotation RMSE values generated by the dataset evaluation tool. As can be seen in the table, the pipeline enhanced the pose estimation accuracy in almost all the sequences.

In sequence 2 (shown in Figure 6), note that the effect of the pipeline is very obvious since the presence of the pipeline significantly enhanced the pose estimation accuracy compared to the VO pose estimation without the pipeline. Notice that the reason for the divergence of the VO in the first case is due to the absence of enough features in the images, which made the VO unable to estimate the incremental motion for long durations in the sequence. On the other hand, using the CLAHE filter increases the number of stable features in the images, while using the AOR algorithm along with the RANSAC (which is present in both cases) ensures more accurate incremental motion estimation for all received images. This leads to a much better VO output, as shown in Figure 6.

In sequence 5, although the average translation RMSE of the VO without the pipeline is lower than that with the pipeline, the actual performance of the pose estimation for the VO with the pipeline is much better (as shown in Figure 7). The real performance of the VO odometry is not reflected in Table 1 because the drift in the orientation of the VO without the pipeline causes some estimated poses to look closer to the ground truth compared to the VO output with the pipeline. However, the overall path estimated by the VO with the pipeline is superior to that of the VO without the pipeline.

Finally, in the case of sequence 6, the performance of the VO without the pipeline outperformed that of the VO with the pipeline, as seen in Figure 8. This may be attributed to the fact that the amount of features available after applying the AOR is not enough for accurate motion estimation. This is further discussed in Section 5.1.2.


**Table 1.** KITTI dataset accuracy comparison.

**Figure 6.** The pose estimation results of sequence 2 in the KITTI dataset with and without pipeline along with the dataset ground-truth. The proposed pipeline significantly enhances the output accuracy of the VO in this sequence. The VO output without the pipeline suffers from major drift errors and can even be considered to diverge. The black square represents the start of the paths and the black circles represent the ends of the paths.

**Figure 7.** The pose estimation results of sequence 5 in the KITTI dataset with and without the pipeline, along with the dataset ground-truth. The performance of the VO with the pipeline enhances the performance of the VO in both translation and orientation estimation. This is not reflected in the average RMSE of the sequence due to the drift in the orientation of the VO without the pipeline, which causes some estimated poses to look close to the ground truth, although the overall estimated trajectory is much worse. The black square represents the start of the paths and the black circles represent the ends of the paths.

**Figure 8.** The pose estimation results of sequence 6 in the KITTI dataset with and without the pipeline along with the dataset ground-truth. The VO without the pipeline shows more accurate pose estimation compared to VO with the pipeline. However, the performance of both algorithms are very similar. The black square represents the start of the paths and the black circles represent the ends of the paths.

#### 5.1.2. Effect of AOR

One of the contributions of the paper is the new outlier rejection algorithm named AOR. In this subsection, the effect of AOR alone on the performance of the VO is studied. To this end, the translation and orientation RMSE results for the VO with AOR are also reported in Table 1.

As shown in Table 2, the AOR significantly contributed to the enhancement of some of the sequences. For example, the table shows that the use of AOR was responsible for the enhancement of sequence 2, which diverged without the use of it, as shown in Figure 6. Furthermore, the use of AOR also resulted in better results for sequences 0, 4, and 7.


**Table 2.** AOR effect on pose estimation.

The use of AOR resulted in worse results in the case of the other results. The reason for this effect is the absence of enough features for motion estimation after applying the AOR algorithm, which results in worse motion estimation due to the limited amount of information available. This problem can be addressed by increasing the threshold of the AOR *η* to increase the number of inliers.

As can be seen in the results, the AOR algorithm is an aggressive outlier rejection method. In other words, the AOR might result in the removal of matched features with slight deviations. This means that the use of AOR on an image requires the presence of a sufficient amount of stable features for motion estimation. Otherwise, the AOR will lead to the deterioration of the motion estimation due to decreasing the amount of matched features. The threshold weight can be altered to obtain a balanced amount of inliers to minimal outliers. However, this will improve the output of some sequences, and other situations might deteriorate. This is why the integration of AOR with CLAHE and SSC is a very good combination because each of the three stages plays a role in enhancing the motion estimation while complementing the negative effects of the other ones.

Although the use of AOR can lead to the removal of many matched features, the presence of CLAHE increases the number of stable features in the images. Moreover, despite the increase in features due to CLAHE, this might lead to a concentration of features in a certain region of the image. However, the use of SSC prevents such poor distribution of the features. Although the use of CLAHE, while adding more stable features, can also add more outliers in the features, the AOR acts to remove these outliers. In conclusion, the presence of the three stages of the pipeline acts as a desirable combination to overcome the drawbacks of each of the stages while utilizing their advantages.

It is worth mentioning that the AOR resulted in worse results for some sequences; however, the overall performance of the VO with the pipeline (2.71%) was better than that of the VO without the pipeline (3.29%). Furthermore, the use of the complete pipeline still resulted in better accuracy for almost all sequences compared to the VO without the pipeline, as shown in Table 1.

The VO with pipeline results in better performance because the use of CLAHE increased the overall amount of features while SSC uniformly distributed those features helping to achieve better results along with the AOR outlier removal.

#### 5.1.3. Computational Cost

It is expected that adding processing stages to the VO algorithm will lead to an increase in computational time. This can indeed be seen in Table 3, where the average computation time for the VO is reported after adding each of the proposed stages of the pipeline. However, the significant benefits in the accuracy of the pose estimation outweigh the increased computational cost.

In Table 3, the computation time of the VO algorithm with the AOR algorithm only is reported. Notice that the computation time of the algorithm after adding the AOR to the VO is less than that of the VO without AOR. As discussed in Section 3, this is due to the fact that the AOR removes the false-matched features. Accordingly, this simplifies the mission of the RANSAC, which results in a faster motion estimation convergence and a faster performance, as shown in Table 3. This also explains why the full pipeline computational time is less than the case of adding CLAHE only or CLAHE and SSC.


**Table 3.** Computation time comparison for KITTI sequences.

The mentioned average computational time shows that the algorithm is capable of working in real-time while receiving up to 6 fps. The performance of the algorithm, as well as the accuracy of the pose estimation, can be further enhanced through the use of a Graphical Processing Unit (GPU) and multithreading processing.

#### *5.2. TUM RGB-D Dataset/RGB-D VO*

#### 5.2.1. Pose Accuracy Comparison

Table 4 shows the translation and orientation RMSE for nine sequences from the TUM RGB-D dataset. As shown in the table, the RGB-D VO with the proposed pipeline shows better pose estimation accuracy for all sequences. This enhancement varies from one sequence to another based on the lighting and the number of features available in each sequence.

Figure 9 shows an example of the pose estimation output from the VO algorithm with and without the pipeline. It can be seen that the VO without the proposed pipeline suffers from large motion estimation errors at the beginning of the path, which in turn results in large drift errors over the rest of the path. As for the VO with the pipeline, although there is still an error in the estimated path, the error is significantly smaller than that of the VO without the pipeline, especially at the beginning of the path, causing a much better estimation for the rest of the path. The better performance of the VO with the pipeline algorithm is attributed to the increased amount of detected features at the beginning of the path compared to that of the VO without the pipeline.


**Table 4.** TUM accuracy comparison.

**Figure 9.** The pose estimation results for the sequence fr1/pioneer\_360 in the TUM dataset with and without the pipeline along with the dataset ground-truth. The pose estimation accuracy of the VO with the pipeline is superior, while the VO without the pipeline suffers from both poor translation and orientation estimation. The black square represents the start of the paths and the black circles represent the ends of the paths.

Since the TUM sequences are recorded indoors, adding the CLAHE stage results in a significant increase in the detected features (some examples are shown in Figure 10). Note that for these sequences, an RGB-D VO algorithm is used, which means that only the features with an observable depth by the depth sensor can be used for motion estimation. In other words, the far features in the images cannot be used. Through the SSC algorithm, the features detected in the images are well distributed, and thus, the number of close features with observable depth increases (see Figure 10).

The results shown in Figure 9 and Table 4 confirm the efficacy of the proposed pipeline for VO algorithms, even for indoor scenarios.

**Figure 10.** Three examples from TUM sequences showing the effect of CLAHE and SSC in indoor scenarios. The images in the top row show the images with the detected features using SURF, detected without the use of CLAHE and SSC algorithms. The images in the bottom row show the features detected by SURF after adding the CLAHE and SSC stages.

#### 5.2.2. Computational Cost

Table 5 shows the average computational time for the VO with the different added stages. Notice that, in this case, the results are slightly different from those of the computation cost analysis shown for KITTI sequences. As expected, the computational cost increases for the different stages of the proposed pipeline. However, in KITTI sequences, adding the AOR algorithm to the pipeline resulted in a faster performance compared to the VO without any stages. This is not the case for TUM sequences, where the computational cost still increases with the AOR algorithm. It is postulated that the amount of features detected in the case of TUM is larger or the number of outliers in the matched features set is larger, which is mainly based on qualitative and quantitative analysis. Nevertheless, adding the AOR to the pipeline results in lowering the computation cost of the VO algorithm. As can be seen in Table 5, the average computational cost of the VO algorithm when adding CLAHE and SSC is larger than that of the overall pipeline computational cost.


**Table 5.** Computation time comparison for TUM sequences.

*5.3. Summit XL Steel Sequences/Monocular VO*

#### 5.3.1. Pose Accuracy Comparison

The translation and orientation RMSE are reported in Table 6 for three different scenarios. The results show the accuracy enhancement in the case of the VO with the pipeline. For the Summit XL Steel robot sequences, a monocular VO algorithm was used for validating the proposed processing pipeline. Since the scale is unobservable, the robot's wheel encoders are used to calculate its speed and the scale of the odometry. This means that a component of the error is due to the error in the velocity measurement taken by the encoders. However, since the same data are used for both cases, this effect is the same for both cases and will not make any bias in the comparison.


**Table 6.** Summit accuracy comparison.

Figure 11 shows the estimation results for VO with and without the pipeline. As can be seen in the figure, the accuracy is superior in the case of the VO with the proposed pipeline. This is especially true at the end of the sequence, at which the VO without the pipeline suffered from a large drift error. The presence of the AOR resulted in removing many matched outliers, which would have caused bad motion estimation and a significant amount of drift errors in the case of VO without the proposed pipeline.

**Figure 11.** The pose estimation results for a rectangular sequence executed by Summit XLS steel. The figure shows the output of the VO with and without the proposed pipeline. The VO with the pipeline shows better pose accuracy especially at the end of the sequence while the VO without the pipeline suffers from significant amount of drift error. The black square represents the start of the paths and the black circles represent the ends of the paths.

#### 5.3.2. Computational Cost

Table 7 shows the computational time analysis of several combinations of VO algorithms, including the proposed VO algorithm. As was illustrated before, the best computational performance was for the VO with the AOR algorithm. This is a direct result of

the better outliers removal of the AOR, which results in a better and faster convergence of the motion estimation algorithm. The table also shows the translation and orientation RMSE for each of the different VO combinations. The results confirm that the proposed VO results in the best performance.


**Table 7.** Computation time comparison for summit XLS steel sequences.

#### *5.4. Discussion*

For all the scenarios, and for all VO types used in this paper, the proposed pipeline showed better performance compared to the VO without the pipeline. Specifically, adding the three additional stages to the actual VO algorithm enhanced the accuracy by an average of 37% for the considered datasets. These additional three stages can be added to any feature-based VO algorithm to enhance its accuracy and robustness.

In Tables 3, 5 and 7, the results for different combinations of the three proposed stages were reported. In the three cases, the VO with the full proposed pipeline showed better pose estimation accuracy. This shows that the three stages proposed in this paper are integral. Each one of the three serves its own purpose and contributes to the overall enhancement. However, as expected, this came with an increase in the computational cost of the algorithm. Notice that the increase in the computational cost is still not significant (an average of 37 ms) and does not result in a large reduction in the number of frames per second. In most applications, this increase in the computational cost can be accepted to improve the pose estimation accuracy in return (as shown in Table 8).


**Table 8.** The pipeline effect on VO.

#### **6. Conclusions and Future Works**

In this paper, an image processing pipeline was introduced to enhance the accuracy and robustness of VO algorithms. The proposed pipeline consists of three stages, CLAHE, SSC, and AOR. Each stage addresses a separate issue associated with pose estimation error.

The proposed pipeline is intended to be generic and modular, which can be embedded in any feature-based algorithm in order to enhance its performance. In order to validate the proposed pipeline, sequences from KITTI and TUM datasets, as well as experimental sequences generated by a commercial omnidirectional mobile robot, were used. For each dataset, one type of VO was used for validation, namely stereo, RGB-D and monocular. The quantitative and qualitative results show that the proposed pipeline has a significant enhancement in the VO accuracy and robustness, with a minor increase in the computational time.

As mentioned earlier, a VO algorithm relies on integration and, consequently, can suffer from large amounts of errors or the overall divergence of the pose estimation through operation. This can occur due to several causes, such as poor lighting conditions, falsematched features or motion bias. Throughout this paper, the three aforementioned causes of error were addressed by designing a generic pipeline that can be integrated into any visual odometry algorithm to enhance its accuracy.

As a future work, the proposed pipeline is planned to be integrated into visual SLAM algorithms, and the effect of the pipeline will be studied. Furthermore, comparisons with deep learning approaches will also be conducted to see which approach works best with which conditions. Several additional filtration steps will also be investigated to further enhance the performance of VO algorithms. Meanwhile, the computational cost of the algorithm is expected to be reduced through the use of GPUs and parallel computing techniques.

**Author Contributions:** Conceptualization, M.S., M.O. and A.H.; methodology, M.S., M.O. and A.H.; software, M.S., M.O. and A.H.; validation, M.S., M.O. and A.H.; formal analysis, M.S., M.O. and A.H.; writing—original draft preparation, M.S., M.O. and A.H.; writing—review and editing, M.S., M.O., A.H., M.W.M., S.J. and W.M.; supervision, A.H., M.W.M., S.J. and W.M.; project administration, A.H., M.W.M., S.J. and W.M.; funding acquisition, A.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Data Availability Statement:** Publicly available datasets were analyzed in this study. This data can be found here: [https://vision.in.tum.de/data/datasets/rgbd-dataset, www.cvlibs.net/datasets/kitti, accessed on 27 October 2022].

**Acknowledgments:** The work in this paper is in part supported by Natural Sciences and Engineering Research Council (NSERC) of Canada under the Strategic Partnership Grant for Projects (STPGP-506987-2017) and under Postdoctoral Fellowship grant (PDF-532957-2019).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


## *Article* **Up-Sampling Method for Low-Resolution LiDAR Point Cloud to Enhance 3D Object Detection in an Autonomous Driving Environment**

**Jihwan You and Young-Keun Kim** *∗*

School of Mechanical and Control Engineering, Handong Global University, Pohang 37554, Republic of Korea **\*** Correspondence: ykkim@handong.edu

**Abstract:** Automobile datasets for 3D object detection are typically obtained using expensive highresolution rotating LiDAR with 64 or more channels (Chs). However, the research budget may be limited such that only a low-resolution LiDAR of 32-Ch or lower can be used. The lower the resolution of the point cloud, the lower the detection accuracy. This study proposes a simple and effective method to up-sample low-resolution point cloud input that enhances the 3D object detection output by reconstructing objects in the sparse point cloud data to produce more dense data. First, the 3D point cloud dataset is converted into a 2D range image with four channels: x, y, z, and intensity. The interpolation on the empty space is calculated based on both the pixel distance and range values of six neighbor points to conserve the shapes of the original object during the reconstruction process. This method solves the over-smoothing problem faced by the conventional interpolation methods, and improves the operational speed and object detection performance when compared to the recent deeplearning-based super-resolution methods. Furthermore, the effectiveness of the up-sampling method on the 3D detection was validated by applying it to baseline 32-Ch point cloud data, which were then selected as the input to a point-pillar detection model. The 3D object detection result on the KITTI dataset demonstrates that the proposed method could increase the mAP (mean average precision) of pedestrians, cyclists, and cars by 9.2%p, 6.3%p, and 5.9%p, respectively, when compared to the baseline of the low-resolution 32-Ch LiDAR input. In future works, various dataset environments apart from autonomous driving will be analyzed.

**Keywords:** 3D upsampling; LiDAR super-resolution; interpolation; 3D object detection

#### **1. Introduction**

Light detection and ranging (LiDAR) is a core mapping and detection sensor technology that ensures the safe and autonomous navigation of robots, drones, and vehicles. LiDAR is typically implemented in autonomous driving applications for 3D object detection, and particularly the detection of pedestrians, cyclists, and vehicles from a few to hundreds of meters. 3D LiDAR with high vertical resolution, such as 64-channel (Ch) LiDAR, can achieve highly accurate 3D detection of relatively small objects within a large volume area. Consequently, it is used in autonomous vehicles to obtain a more vertical dense point cloud and to achieve better object segmentation and a safer driving experience.

Furthermore, the sparsity of the point cloud increases with the increase in the distance of the object owing to the fixed vertical angles of the laser diodes of a 3D LiDAR. A lowresolution LiDAR, such as 16-Ch, may not be able to effectively capture the points required to distinguish a pedestrian from the background, causing major safety issues.

Accordingly, the representative open datasets of LiDAR 3D object detection, such as the KITTI [1] and Waymo Open [2] datasets, obtain the data point clouds of the driving environment by using 64-Ch LiDARs.

A high-resolution LiDAR, although preferable for accurate detection and safe driving, is expensive. A 64-Ch LiDAR is approximately 10–20 times more expensive than a low-

**Citation:** You, J.; Kim, Y.-K. Up-Sampling Method for Low-Resolution LiDAR Point Cloud to Enhance 3D Object Detection in an Autonomous Driving Environment. *Sensors* **2023**, *23*, 322. https:// doi.org/10.3390/s23010322

Academic Editors: Yafei Wang, Chao Huang, Peng Hang, Zhiqiang Zuo and Bo Leng

Received: 31 October 2022 Revised: 20 December 2022 Accepted: 23 December 2022 Published: 28 December 2022

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

resolution 16-Ch LiDAR. The high cost of high-resolution LiDAR considerably restricts its widespread implementation in commercial vehicles. This cost can be reduced through the super-resolution of the sparse point cloud of a low-resolution LiDAR while achieving a high 3D object detection accuracy.

The conventional 3D data up-sampling methods include bilinear or bicubic interpolations [3]. Although they present a high processing speed, they overly smoothen the object edges. Most existing deep-learning super-resolution models are designed for 2D images, and only a few models can be implemented for 3D point clouds. PU-Net [4] proposed a method for up-sampling 3D point cloud models; however, they primarily focused on obtaining the resolution of small objects. Shan et al. [5] proposed a deep-learning model to increase the LiDAR resolution, wherein the super-resolution effects of the model on the simulation data were compared with those of other conventional methods. However, object detection enhancement due to the super-resolution was not analyzed and validated. Contrary to such methods, we propose a simple and efficient method for the up-sampling of the point cloud based on a non-deep-learning model. This method enhances the 3D object detection accuracy by predicting and filling the sparse space in the point cloud data obtained by a low-resolution LiDAR. The proposed method also solves the over-smoothing problem of the conventional interpolation methods while achieving higher operational speed and better object detection performance when compared to the aforementioned deeplearning 3D super-resolution method [4,5]. The proposed method first converts the 3D point cloud data into 2D range images using four channels. It then implements a weighted interpolation with a decaying factor and excludes the zero value outliers. Subsequently, the up-sampled point cloud is passed to the widely used point-pillar [6] 3D object detection model to evaluate the improvement in the detection accuracy.

The contributions of this paper can be summarized as follows:


#### **2. Related Work**

#### *2.1. 2D Image Up-Sampling*

Previous studies conducted on super-resolution were primarily focused on the enhancement of a single 2D image [7]; in this process, a low-resolution image with coarse details is converted to a corresponding high-resolution image with refined details. Other similar terminologies for super-resolution in the research community include image scaling, interpolation, and up-sampling. In the current study, we use the term up-sampling to refer to super-resolution.

The conventional image up-sampling methods employ interpolation methods that fill an empty space with data points using a low-order polynomial or other general kernels. The most commonly used interpolation approaches on 2D images include the nearestneighbor, bilinear, bicubic, and lanczos [7] schemes; these methods are simple and present a low computational overhead. However, the visual quality obtained is low due to the heavy smoothing performed on the edges.

Deep-learning methods, which employ highly non-linear kernels, present better upsampling performance for complex scenarios. The super-resolution convolutional neural network (SR-CNN) [8] is a pioneer end-to-end up-sampling model that uses several convolution layers stacked on top of each other. The efficient sub-pixel convolutional neural network (ESPCN) [9] is a more complex and faster up-sampling method that extracts several features in the low-resolution feature map that are aggregated to reconstruct a highresolution image output. Additionally, various approaches of deep-learning have been presented for image up-sampling, such as residual networks methods [10,11], recursive

network methods [12,13], progressive reconstruction designs [14,15], densely connected networks [16,17], multi-branch designs [18,19], attention-based networks [20,21], multiple degradation handling networks [22,23], and GAN (generative adversarial networks)-based models [24,25].

#### *2.2. 3D Point Cloud Up-Sampling*

The 3D point cloud up-sampling process requires interpolation in a higher-dimension space, including the depth map, when compared to the 2D image up-sampling process. Typically, the low-resolution point cloud data is a 3D or higher-dimension space comprising an empty space with sparse, unordered data points, which increases the complexity of the interpolation.

#### 2.2.1. 3D Up-Sampling with RGB Image

Most existing studies on depth map up-sampling have combined corresponding 2D RGB images. The existing image-guided up-sampling methods use fusion filters such as Markov random field (MRF) [26,27] bilateral filtering with pixel color information [28]. The depth interpolation is estimated using a filter based on the information of the neighboring pixels such as the geometric distance, intensity, and anisotropic diffusion [29]. Some recent studies have introduced deep-learning models for image-guided 3D up-sampling. Shan et al. [5] projected the 3D point cloud as a 2D range image and up-sampled the cloud to a denser map using an encoder–decoder architecture with residual connections similar to REDNet [11] and UNet [30]. Although the reconstruction loss is lower when compared to conventional interpolation methods, the simulation-based method [5] did not demonstrate the effectiveness of reconstruction in improving 3D object detection accuracy.

#### 2.2.2. 3D Up-Sampling with Point Cloud

For the applications where only LiDAR is available, the 3D up-sampling of the sparse point cloud must be performed without using any other high-resolution guides. Only a few studies have been conducted on LiDAR up-sampling methods that map the sparse 3D points into a higher resolution using deep-learning models.

PU-Net [4] is a recently developed data-driven up-sampling method on the 3D point cloud data; it learns the multi-level features of each point and implicitly expands the point sets via multi-branch convolution units in feature spaces. This model performs the upsampling of small-sized objects in indoor environments. The reconstruction loss involved is lower when compared to conventional interpolation methods; however, the effectiveness of the reconstruction in enhancing the 3D object detection accuracy was not demonstrated.

The proposed up-sampling method for low-resolution LiDAR point clouds directly enhances the 3D object detection. The proposed method is not based on deep learning, contrary to the recent methods; hence, it achieves more efficient calculations. The reconstructed 3D point clouds are then fed to the 3D object detection model to evaluate the effectiveness of the object detection. Furthermore, we validated the up-sampling effectiveness of the proposed method by performing a comparative analysis of our method with the recently developed deep-learning method [5] and the conventional interpolation methods in terms of the object detection performance on the KITTI dataset.

#### **3. Comparison Methods**

#### *3.1. Test Data Preparation*

This section presents the KITTI dataset [1] of the bird's-eye view used for 3D object detection. The dataset was acquired using the Velodyne HDL 64-Ch LiDAR and a total of 7481 frames were used for training. High-resolution LiDAR range images were obtained by converting the 64-Ch LiDAR point cloud data into range. The low-resolution range images were obtained by extracting some rows out of the 64-Ch LiDAR range images. In the following section, we present a comparison of the up-sampling performance between the low-resolution (16-Ch) LiDAR point cloud and the high-resolution point cloud (64-Ch) of the conventional and proposed methods.

#### *3.2. Conventional Up-Sampling*

The representative conventional up-sampling methods were selected for the performance comparison with the proposed up-sampling method. The conventional up-sampling methods include the interpolation-based and deep-learning-based methods, as explained in Section 2.

Figure 1 presents the comparisons of the basic representative interpolation methods, which include the nearest-neighborhood, bilinear, and bicubic interpolations. In Figure 1, the nearest-neighborhood interpolation method cannot describe complex shapes such as curves of objects. Several noisy points are generated around the object in the bilinear and bicubic interpolation. Considering the sparsity characteristic of a range image, it is essential to perform interpolation for all of the points apart from the zero value point in a range image.

**Figure 1.** Up−sampled point cloud from 16-Ch (**a**) to 64-Ch (**e**) using classical interpolations (pedestrian). Comparison of (**b**) nearest-neighborhood, (**c**) bilinear, and (**d**) bicubic interpolations.

#### *3.3. Deep Learning Based Up-Sampling*

Among the deep-learning-based up-sampling methods, we compared the ESPCN [9] network and the method proposed by Shan et al. [5] as a reference. When the ESPCN [9] model was implemented, the distribution of points did not sufficiently describe the shape of an object, as shown in Figure 2. The simulation-based model [5] presented relatively

good concurrence with the shape of the 64-Ch LiDAR; however, there were still some noisy points around the object.

**Figure 2.** Up−sampled point cloud using deep-learning-based methods (pedestrian). The increase of the vertical angle resolution increases the density of the point cloud along all of the Cartesian axis directions.

#### **4. Proposed 3D Up-Sampling Method**

Figure 3 presents the outline of the research strategy used for the proposed method. First, the unordered 3D point cloud data are projected onto ordered range map images with multiple channels. The proposed up-sampling method reconstructs the sparse range image input into a dense 3D point cloud. It is then fed to the 3D object detection model as the input data and the detection performance is evaluated on the KITTI dataset.

**Figure 3.** Research strategy for the proposed up-sampling method for 3D detection.

#### *4.1. Projection to Range Image*

The number of 3D points obtained through a single scan of a rotating LiDAR is determined by the vertical channel numbers, vertical resolution (*Vres*), horizontal FOV, and horizontal angular resolution (*Hres*).

The point cloud data is structured as a list of 3D positions and the intensity value of the laser beam points in the order of laser diode scanning, which may vary based on the LiDAR model. The dimension of the point cloud is expressed as *N* by *D*, where *N* represents the total number of beam points and *D* represents the feature numbers of the beam point, such as the 3D Cartesian coordinates and intensity value. Therefore, the data in the point cloud are in an unordered form in terms of the spatial coordinates, contrary to the 2D image pixels.

The complexity of handling such unordered data can be reduced by projecting the point cloud into a range map, which is spatially ordered using the 2D coordinates of the horizontal and vertical scanning angles, as shown in Figure 4. The dimensions of the range map are *H* by *V* by *C*, where *H* denotes the horizontal azimuth (*α*) angles, *V* denotes the number of LiDAR vertical channels, and *C* denotes the channel of range measurement and intensity. The up-sampling techniques for 2D images can be implemented easily since the range map is an ordered structure.

The projection of the 3D point cloud data to the range map in the azimuth and vertical angles can be processed using the transformation equation in the spherical coordinates, which is expressed as follows:

$$\begin{array}{l} R = \sqrt{\mathbf{x}^2 + \mathbf{y}^2 + \mathbf{z}^2} \\ \alpha = \arctan(y, \mathbf{x}) \\ \omega = \arctan\left(z, \sqrt{\mathbf{x}^2 + \mathbf{y}^2}\right) \end{array} \tag{1}$$

The dimension of the 3D point cloud from Velodyne LiDAR (HDL-64E) is 2048 × 64 × 4 Ch (X-Ch, Y-Ch, Z-Ch, Intensity-Ch), and it can be converted to a range map with the dimension of 2048 × 64 × 2 Ch (*α*-Ch, *ω*-Ch).

**Figure 4.** 2D Range image from 3D point cloud.

#### *4.2. Proposed Up-Sampling Technique*

The existing up-sampling methods do not adequately represent the shape of highresolution LiDAR points when there are meaningless zero values within the pixels of the range image or when estimating the boundary of an object. In this study, we present a novel interpolation technique that minimizes the influence of the outlier points. The robustness of the proposed interpolation method is improved by implementing the following strategies.

#### 4.2.1. Pixel-Distance Weighted Interpolation

The up-sampling is applied in the direction of elevation to increase the density of the vertical angle of the LiDAR channel. The proposed method interpolates the empty spaces of the LiDAR range image using the six surrounding neighbor pixels with weights based on their relative distances from the anchor point, similar to the four-point depth interpolation method proposed by Lim et al. [31]. The six neighborhood pixels for the interpolations are labeled from *P*<sup>1</sup> to *P*6, starting from the top-left pixel, as shown in Figure 5. The interpolated value, *P* , is obtained from the weighted sum of the neighbors with the decaying factor against the distance as follows:

$$\mathcal{W}\_{\bar{l}} = \mathfrak{e}^{-0.5d\_i} \tag{2}$$

$$P' = \frac{\sum\_{i=1}^{6} \mathcal{W}\_{i} P\_{i}}{\sum\_{i=1}^{6} \mathcal{W}\_{i}},\tag{3}$$

where *Pi*, *Wi*, and *di* represent the range values, the weight ratio, and the pixel distance between *Pi* and *P* , respectively.

The interpolated output in the 2D range image is converted to a 3D point cloud to visualize and analyze the interpolation effect, as shown in Figure 6.

The distance-weighted interpolation result presented in Figure 6c depicts a denser 3D output with amplified noisy data owing to the fact that the interpolation with the neighboring points that have zero or maximum range values can be considered as the outlier when compared to the surrounding data. The negative effect of the zero-valued pixels in the interpolation process can be reduced by removing the outlier neighbor pixels. The zero or maximum-valued pixel in the 2D range image refers to the LiDAR beam point that does not project on the object in the measurable range.

**Figure 6.** Up-sampled point cloud using proposed methods (pedestrian). Comparison of: (**a**) raw data of 16-Ch LiDAR; (**b**) ground truth 64-Ch LiDAR; (**c**) our proposed method with only pixel-based distance terms; and (**d**) the final proposed method.

4.2.2. Pixel-Distance and Range Weighted Interpolation

These outlier points can be skipped in the weighted sum process using the coefficient, *si*, which presents a value of 0 for outliers; otherwise, it presents a value of 1.

Furthermore, the up-sampling can be enhanced by readjusting each weight, *Wi*, based on the relative depth range among the neighbor pixels. The closer the depth of the pixel is, the more weight is given in the interpolation, which is similar to assuming that the closer range neighbor pixels are likely to be projected on the same object. Then, the modified interpolation equation can be expressed as:

$$P' = \frac{\sum\_{i=1}^{6} s\_i \mathcal{W}\_i P\_i}{\sum\_{i=1}^{6} s\_i \mathcal{W}\_i} \tag{4}$$

with:

$$\mathcal{W}\_{i} = e^{-0.5d\_{i}} \cdot \left(\frac{2}{1 + e^{\left(R\_{i} - R\_{\text{min}}\right)}}\right) \,, \tag{5}$$

where *Ri* is the range value of each neighbor data and *R*min is the minimum range value among them. This interpolation result, as shown in Figure 6d, clearly removes the noisy interpolated data for up-sampling enhancement.

#### 4.2.3. Increasing Channel Dimension of 2D Range-Image

The interpolation performance can be further improved by increasing the dimensions of the 2D range image. The 1-Ch 2D range image can be converted to a higher-dimension 2D range image with 4-Ch of the Cartesian coordinates (*x*, *y*, *z*) and intensity values. The values of *x*, *y*, and *z* can easily be obtained from the azimuth, elevation, and range values as:

$$\begin{aligned} x &= r \cos \omega \cos \alpha \\ y &= r \cos \omega \sin \alpha \\ z &= r \sin \omega \end{aligned} \tag{6}$$

where azimuth (*α*) and elevation (*ω*) represent the coordinates of the 2D range image, and *r* denotes the range value of the pixel.

Figure 6 presents the up-sampling result of the proposed method on a pedestrian dataset. The result demonstrates that this method closely resembles the 64-Ch ground truth. Additionally, the proposed interpolation method presents the highest resemblance to the reference interpolated output when compared to the conventional methods presented in Figures 1 and 2.

#### **5. Up-Sampling Effect on 3D Object Detection**

#### *5.1. Experiment Overview*

The effectiveness of the proposed LiDAR up-sampling method was demonstrated by feeding it as the input to the selected 3D object detection model and evaluating the detection accuracy. The point-pillar model [6], which is an effective and widely used 3D detection model with LiDAR data, was used as the 3D model. This model was trained using the dataset up-sampled via the proposed method from the baseline point cloud data.

The baseline dataset comprises the point cloud data of a low-resolution (32-Ch) LiDAR, which was generated by down-sampling the KITTI [1] 3D dataset of a 64-Ch LiDAR.

The numbers of data points in the training, validation, and test datasets were 7481, 1856, and 3769, respectively. The overall performances of the 3D detection task were evaluated based on the mAP (mean average precision) of easy, moderate, and hard difficulty for each class of cars, pedestrians, and cyclists, which are the most common obstacles in an autonomous driving environment.

#### *5.2. Algorithm Design (Ablation Studies)*

An ablation study was conducted to analyze the effect of the following methods: (A) pixel-distance weighted interpolation, (B) pixel and range weighted interpolation, (C) additional channel on the 2D range image, and (D) ground removal on the 3D object detection. The baseline dataset of the 32-Ch point cloud was up-sampled to a 64-Ch point cloud using the aforementioned methods and the accuracy of the 3D object detection was evaluated for comparison.

#### 5.2.1. (A) Effect of Pixel Distance Weight Interpolation

The baseline dataset was up-sampled to a 64-Ch point cloud using only the six-point pixel distance weighted interpolation based on Equations (2) and (3).

This method presented inferior detection results for all of the classes when compared to the detection output with the baseline dataset of the 32-Ch point cloud, as described in Table 1. The interpolation performed using only the pixel distances of the neighbor points, similar to the method reported in [31], raised the 3D detection accuracy from 31.32% to 43.85%.


**Table 1.** 3D object detection performance (mAP) of the proposed methods on the test dataset.

#### 5.2.2. (B) Effect of Pixel Distance and Range Weight Interpolation

The up-sampling method was enhanced by neglecting the zero-value range points to avoid the outlier effect on the interpolation. Additionally, the range difference between the anchor point and neighbors was considered to apply a higher weight on the points that had a higher chance to be on the same object surface. The data obtained from the interpolation performed using Equations (4) and (5) were applied to the baseline dataset and used as the training data for the 3D detection model.

The mAP of the overall classes for all of the difficulty levels increased by up to 3.6% when compared to the baseline result, as presented in Table 1. The mAP of the car detection task increased to 72.39%, when compared to the mAP of 65.82% achieved by the baseline. However, the mAP of the cyclist detection task decreased slightly.

#### 5.2.3. (C) Effect on Increasing Range Image Channels

In the previous sections, the point clouds were up-sampled to a 2D range image using 2-Ch of the range and intensity values. The axes of the image are the azimuth and vertical angles, as shown in Figure 4.

The 2D image with a higher number of channels was generated using the Cartesian 3D coordinates of each point with Equation (1). Therefore, the input image used for the interpolation was a 2D image with 4-Ch of the x, y, and z-axis position values and the intensity value.

The detection achieved by the up-sampled image of the 4-Ch presented superior performance when compared to the baseline. The overall mAP for all the classes was enhanced by as much as 8%. The mAP of the easy level cyclist detection increased from 58.1% of the baseline to 64.95%. The overall mAP was enhanced by 5.01%p, 4.59%p, and 4.47%p for the easy, moderate, and hard levels, respectively, when compared to the proposed interpolation method with the 2-Ch image.

Figures 7 and 8 show the 3D object detection results of the selected scene for pedestrians and vehicles, respectively. The up-sampled point cloud and the detection results closely resemble the ground truth. This helped in accurately reconstructing the objects for locating the pedestrians and vehicles.

**Figure 7.** An example of 3D object detection performed using the proposed up-sampling method for pedestrian detection

**Figure 8.** An example of 3D object detection performed using the proposed up-sampling preprocessing for vehicles detection

#### 5.2.4. (D) Effect on Ground-Removal Pre-Processing

A high proportion of the point cloud data included the points on the ground. The effect on the 3D detection accuracy due to the ground-removal pre-processing performed before the up-sampling was analyzed.

The ground removal process was simplified by discarding all of the points located at a lower elevation than the ego-vehicle. Although this approach can work effectively in most driving scenarios, it may fail if the relative angle between the ground and the ego-vehicle is not flat. In this study, we applied the ground removal algorithm presented by [32], which is based on the inclination angle of LiDAR beams corresponding to the horizontal axis. The up-sampling of the 4-Ch image was conducted after the ground removal processing.

The 3D detection results presented in Table 1 demonstrated that the ground removal decreased the detection accuracy, indicating that the ground points have important roles in the detection model that could not be ignored.

#### *5.3. Comparison with Previous Methods*

The performance enhancement of the 3D object detection task owing to the use of the proposed up-sampling method was compared with the widely used previous methods and the SOTA (state-of-the-art) method for 3D up-sampling. The same baseline datasets were used for the up-sampling process, whereas the point-pillar model was used for the detection.

Table 2 presents the values of the detection mAP for the classes of pedestrians, cyclists, and cars that were compared to analyze the detection enhancement presented by different up-sampling methods. The strict intersection over union (IoU) of 0.7 was applied for the evaluation. The abbreviations *E*, *M*, and *H* represent the difficulty levels of easy, moderate, and hard, respectively.

The CNN-based up-sampling methods of ESPCN [9] and the method proposed by Shan et al. [5] exhibited inferior 3D detection performances. These methods were able to reconstruct a denser point cloud with a low overall loss score. However, they could not accurately reconstruct the object shapes, thereby leading to inferior detection performance.

Table 2 presents a comparison between the moderate-level class detections. It can be clearly observed that the proposed solution (Case C) exhibited the best performance in each class detection when compared to the existing up-sampling methods. The mAP of the overall class (level M) was 45.4%, which is approximately 7% higher than that of the baseline dataset.


**Table 2.** Comparison table of object detection performance using pretrained point-pillars model (up-sampled from 32-Ch to 64-Ch). The strict IoU of 0.7 was applied.

#### **6. Conclusions**

This study proposed an up-sampling method for a low-resolution LiDAR to enhance 3D object detection by reconstructing the objects in sparse point cloud data into data with a higher vertical angle resolution.

The existing 3D up-sampling methods based on deep neural networks focus on decreasing the overall loss in the up-sampling reconstruction process instead of enhancing the object detection performance. Our proposed sampling method is designed as a preprocessing stage to implement an enhanced 3D object detection model. First, we converted the 3D point cloud dataset into a 2D range image with 4-Ch. The interpolation on the empty space was calculated based on both the pixel-distance and range values of six

neighbor points to conserve the shapes of the original object during the reconstruction. This approach solved the over-smoothing problem faced by the conventional interpolation methods while presenting higher operational speed and better object detection performance when compared to the aforementioned deep-learning super-resolution methods.

The effectiveness of our proposed up-sampling method on the 3D detection was demonstrated by applying it to baseline 32-Ch point cloud data and then feeding them as the input to a point-pillar detection model. The 3D object detection result on the KITTI dataset demonstrates that the proposed method could increase the mAP (strict IoU with 0.7) of pedestrians, cyclists, and cars by 9.2%p, 6.3%p, and 5.9%p, respectively, when compared to the baseline with a low-resolution 32-Ch LiDAR input.

The scope of this study was limited to the reconstruction of 32-ch LiDAR to 64-ch LiDAR on the KITTI dataset. In future works, various dataset environments apart from autonomous driving will be analyzed. Additionally, further analysis is required to integrate the proposed up-sampling method with the 3D object detection model based on 2D range images such as the range-sparse network (RSN) [33], for the construction of an end-toend network.

**Author Contributions:** Conceptualization, J.Y. and Y.-K.K.; methodology, J.Y. and Y.-K.K.; writing—original draft, J.Y.; writing—review and editing, Y.-K.K.; project administration, Y.-K.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** We thank Hyungseok Song for his assistance in the experiment.

**Conflicts of Interest:** We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work. There is no professional or other personal interest of any nature or kind in any product, service, and/or company that could be construed as influencing the position presented in the review of this manuscript.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

## *Article* **Advanced Pedestrian State Sensing Method for Automated Patrol Vehicle Based on Multi-Sensor Fusion**

**Pangwei Wang 1,\*, Cheng Liu 1, Yunfeng Wang 1,2 and Hongsheng Yu <sup>1</sup>**


**Abstract:** At present, the COVID-19 pandemic still presents with outbreaks occasionally, and pedestrians in public areas are at risk of being infected by the viruses. In order to reduce the risk of cross-infection, an advanced pedestrian state sensing method for automated patrol vehicles based on multi-sensor fusion is proposed to sense pedestrian state. Firstly, the pedestrian data output by the Euclidean clustering algorithm and the YOLO V4 network are obtained, and a decision-level fusion method is adopted to improve the accuracy of pedestrian detection. Then, combined with the pedestrian detection results, we calculate the crowd density distribution based on multi-layer fusion and estimate the crowd density in the scenario according to the density distribution. In addition, once the crowd aggregates, the body temperature of the aggregated crowd is detected by a thermal infrared camera. Finally, based on the proposed method, an experiment with an automated patrol vehicle is designed to verify the accuracy and feasibility. The experimental results have shown that the mean accuracy of pedestrian detection is increased by 17.1% compared with using a single sensor. The area of crowd aggregation is divided, and the mean error of the crowd density estimation is 3.74%. The maximum error between the body temperature detection results and thermometer measurement results is less than 0.8◦, and the abnormal temperature targets can be determined in the scenario, which can provide an efficient advanced pedestrian state sensing technique for the prevention and control area of an epidemic.

**Keywords:** advanced sensing technology; multi-sensor fusion; crowd density estimation; movable temperature detection; automated patrol vehicle

#### **1. Introduction**

At present, in order to curb the COVID-19 pandemic, governments have divided the current areas into disease control areas and normal areas according to the different status of the epidemic. The source of infection, which may break out in the disease control area, has an important impact on the daily life of the general public. The crowd aggregation and contact temperature measurement method based on hand-held thermometers increases the risk of cross-infection. Therefore, how to reduce the risk of infection for medical practitioners and pedestrians in disease control areas and ensure the effective implementation of epidemic prevention measures have become important issues. In order to solve the above problems, it is necessary to detect crowd aggregation and body temperature rapidly based on the advanced sensing technology. Due to the improvement in sensor accuracy and detection algorithms in recent years, sensing technology and application technology are developing rapidly. A connected and automated vehicle (CAV), which is equipped with sensors, such as a camera, radar and LiDAR, has been widely used in short-distance logistics distribution and security patrols to improve transportation efficiency and driving safety [1]. As one of the CAVs, the automated patrol vehicle can be used as a new solution

**Citation:** Wang, P.; Liu, C.; Wang, Y.; Yu, H. Advanced Pedestrian State Sensing Method for Automated Patrol Vehicle Based on Multi-Sensor Fusion. *Sensors* **2022**, *22*, 4807. https://doi.org/10.3390/s22134807

Academic Editors: Chao Huang, Yafei Wang, Peng Hang, Zhiqiang Zuo and Bo Leng

Received: 28 April 2022 Accepted: 23 June 2022 Published: 25 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

for epidemic prevention and control. Therefore, how to improve the accuracy of sensing and apply high-precision sensing technology to the automated patrol vehicle have become key research directions.

However, there are still some problems in the above directions, which can be classified in the following conditions: (1) pedestrians in public areas, such as in parks, museums and shopping malls, do not have a fixed route, and the difference in crowd density distribution in adjacent areas is tiny, which makes it difficult to detect dense areas effectively; (2) there are technique and evaluation difficulties in the combination of high-precision crowd density estimation and body temperature detection. In order to solve the above problems, an advanced pedestrian state sensing method for automated patrol vehicles based on multi-sensor fusion is proposed. Considering the current pandemic situation and the limitations of the experimental platform, a large and enclosed area with less traffic flow is selected as the work scenario of the automated patrol vehicle. Firstly, through fusing the data of the camera and LiDAR, the multi-dimension data of pedestrians are obtained as the basis of density estimation and face detection. Secondly, the centroids of the clustered pedestrian objects are calculated as the cluster center point. Combined with the positional relationship between pedestrians, crowd targets are detected based on the foreground image. Then, the whole detection area is divided into multiple layers by various scales, and each sub-area of each layer is constructed as the basic unit of the multi-layer fusion method. Through weighted mean theory, the crowd density in the scenario is estimated based on the crowd density distribution matrix. Finally, because of the precision limitation of the sensor, the automated patrol vehicle needs to maintain a fixed distance from crowd targets, and a thermal infrared camera is used to detect the temperature, which can reduce the risk of cross-infection.

#### **2. Related Works**

The advanced sensing technologies lead to the rapid development of CAV, providing a platform foundation for mobile detection. Meanwhile, the gradual development of deep learning [2], pedestrian detection, crowd density estimation and pedestrian body temperature detection techniques have been hot topics.

Pedestrian detection is one of the important parts of automated vehicle sensing systems and has been researched for many years. It is mainly achieved through the neural network [3–5] and open-source computer vision library (OpenCV) [6]. Among them, the convolutional neural network (CNN) proposed by Lecun [7] was representative. Local pixels of the image were perceived by the network and combined as global features in higher layers. This meant that the pooling layer would pay more attention to local features and ignore the overall target. Multi-sensor data fusion methods [8–10] have been paid more and more attention in recent years. Multi-sensor data fusion is a method to combine data with multi-level and multi-spatial information and output a consistent interpretation of the targets. There are three levels of fusion, including: (1) data-level fusion; (2) feature-level fusion and (3) decision-level fusion. In order to reduce the false positive (FP) rate of pedestrians in a complex environment, Han et al. [11] integrated image data and point cloud data deeply through decision-level fusion to improve detection accuracy. Combining the object detection algorithm of you only look once (YOLO) and a light field camera, Zhang et al. [12] proposed an indoor obstacle detection algorithm. This algorithm was suitable for detection in the indoor environment but had not been tested in outdoor environments. In order to detect heavily occluded pedestrians, Seong et al. [13] proposed a fusion algorithm of LiDAR and radar, which avoided the strict requirements regarding the conditions of lighting for the camera, and it was suitable for indoor and outdoor environments.

Crowd density estimation has been paid more and more attention due to the requirements of abnormal event sources localization and the policy under the pandemic situation in recent years. With the advancement in sensor techniques, the current mainstream crowd density estimation is achieved through neural networks [14–18]. Huang et al. [19] proposed a multi-scale fusion conditional generative adversarial network based on a bidirectional

fusion module. The network could solve the problem of scale variation effectively. In order to reduce crowd counting errors caused by large viewing angle changes and severe occlusions, Sajid et al. [20] proposed an end-to-end crowd counting network based on patch rescaling module (PRM) and multi-resolution fusion. In order to make deep features recognizable by CNN, Nguyen et al. [21] fused depth information into a density estimation algorithm. Through a multi-task learning method, the scale variation problem was solved and crowd counting was achieved accurately. Through applying an edge computing method, Wang et al. [22] achieved crowd density estimation efficiently to solve the problem of high network latency caused by the deployment of the density estimation platform in the server; Zhang et al. [23] proposed the adaptive multi-scale context aggregation network (MSCANet) to obtain the full-scale information of the crowd. After the information of different scales was extracted by the network, this information was fused by the network adaptively, which was suitable for crowd counting.

With the gradual development of automated vehicle sensing techniques and body temperature detection techniques, body temperature detection is gradually being applied in the automated vehicle [24,25]. At present, body temperature detection is mainly achieved by the method of thermal imaging [26,27]. With the continuous development of image processing techniques, thermal imaging technology combined with image recognition has become the mainstream method. Based on image processing, Muhammad et al. [28] proposed a pedestrian body temperature detection method to fuse infrared and visible images. The target and the background could be distinguished and the infection problem that might be caused by the gun-type thermometer was avoided by this method. Considering the difficulty of detecting body temperature without distance limitation, this method could measure temperature accurately by strictly fixing the temperature measurement distance and controlling environmental variables. Based on the micro control system 51 series (MCS 51) equipped with infrared temperature measurement function (GY-MLX90614 module), Zhu et al. [29] designed a non-contact infrared temperature measurement system and applied it to the access control platform in the community. As the distances between pedestrians and detective equipment are fixed, the influence of distance on temperature measurement accuracy is reduced. In order to ensure social safety in public areas, Rosepreet et al. [30] proposed an infrared body temperature detection method based on CNN. Non-contact pedestrian body temperature detection was achieved by this method, and the system was applied to the tunnel scenario, improving the detection accuracy by 1.4%.

In the above research achievements, many pedestrian detection methods rely on the high quality of pedestrian datasets. Meanwhile, the data fusion algorithm will be limited by the fixed scenarios. Crowd density estimation is still mainly based on the bird's-eye-view data, which are obtained by surveillance cameras. However, automated-vehicle-based estimation methods are still to be researched. Additionally, there are difficulties in detecting body temperature at the unrestricted distance, and environmental temperature has a large impact on the accuracy of body temperature detection.

Therefore, we propose a novel method to solve the above problems, and the main contributions in pedestrian detection, crowd density estimation and body temperature detection are summarized as follows:


#### **3. Pedestrian State Sensing Method**

Based on intelligent automated vehicles and information fusion technology, an advanced pedestrian state sensing method for automated patrol vehicle based on multi-sensor fusion is proposed to reduce the risk of cross-infection in the epidemic prevention and control areas. Pedestrian state includes pedestrian target, crowd aggregation and pedestrian body temperature. Hence, our method is designed in three parts, including pedestrian data acquisition, crowd density estimation and pedestrian body temperature detection. The method includes sensing equipment, decision equipment and control equipment. LiDAR, camera and infrared are mounted on the body of the automated patrol vehicle as the sensing devices; the industrial personal computer (IPC) with the robot operating system (ROS) is used as the decision-making device; the STM32 Microcontroller is used as the control device. The detection of pedestrian and crowd objects, the estimation of crowd density and the detection of pedestrian body temperature are achieved through the detection method proposed in this paper. The structure of proposed method is shown in Figure 1.

**Figure 1.** Structure of advanced pedestrian state sensing method for automated patrol vehicle based on multi-sensor fusion.

#### *3.1. Pedestrian Data Acquisition*

How to acquire the pedestrian data is the basis of the proposed method in this paper, which consists of two parts: (1) 2D pedestrian data acquisition based on YOLO V4 algorithm; (2) 3D pedestrian data acquisition based on Euclidean clustering algorithm. The obtained data can be used as the data support for crowd density estimation and temperature detection algorithm.

#### 3.1.1. Pedestrian 2D Data Acquisition

The 2D pedestrian data include pedestrian detection box and face detection box based on visual algorithm. To implement the pedestrian detection with strong timeliness and high precision in a specific enclosed area, the Darknet framework based on the YOLO algorithm is used to obtain 2D image data of pedestrians with different occlusion. Since the spatial pyramid pooling (SPP) is adopted in YOLO V4, the network and loss function are optimized, and the intersection over union (IoU) [31] can reflect the degree of intersection between two boxes accurately. Therefore, YOLO V4 is selected as the visual model of

proposed pedestrian detection and face detection algorithm. The structure of pedestrian detection based on YOLO V4 is shown in Figure 2.

**Figure 2.** Structure of pedestrian detection based on YOLO V4.

In order to obtain 2D pedestrian detection box, the pedestrian dataset of the Massachusetts Institute of Technology (MIT) and the data of pedestrians in the campus environment collected by us are adopted as the input dataset of YOLO network training. Through adjusting the brightness and saturation of the pedestrian image, the learning ability and environmental resistance of pedestrian detection model can be improved. In order to solve the mesoscale problem of target detection, feature pyramid network (FPN) is used to construct a pyramid on the feature graph. Similarly, to reduce the amount of calculation and ensure the accuracy, a cross-stage partial (CSP) module is added to each Darknet53 in YOLO network, pedestrian features are extracted by the first 52 layers and fusion features are output by the last layer. In order to prevent overfitting, dropblock regularization is used to randomly drop features in the same pedestrian module. The SPP module and path aggregation network (PANet) module are added between the feature extraction network and the final output layer to enhance pedestrian feature level and complete multi-scale fusion. Finally, the large, medium and small pedestrian targets are predicted, and pedestrian detection boxes are obtained.

In order to obtain face detection boxes, it is necessary to train the YOLO V4 network with a dataset containing facial features. The MIT pedestrian dataset does not contain facial feature information. Therefore, the dataset with face features needs to be added. As Celeba (an open-source dataset of pedestrian faces) [32] does not take up too many resources and the accuracy of face detection box acquisition satisfies the requirements of this paper, Celeba is selected to add to the dataset. The flow chart of training YOLO V4 network and achieving face detection is shown in Figure 3. During training, the adjusted brightness and saturation can improve the robustness of the algorithm. Firstly, we modify Celeba to a dataset format, which can be recognized by YOLO V4. Then, the configuration files, such as "transform.py", "train.txt" and "val.txt", are modified. The number of classifications is adjusted to three types, and the learning rate is adjusted to 0.001. Finally, face detection boxes can be obtained through the YOLO V4 network trained by Celeba.

**Figure 3.** Flow chart of training YOLO V4 network and face detection.

3.1.2. Pedestrian 3D Data Acquisition

Considering the insufficient accuracy of the YOLO V4 algorithm for long-distance detection, LiDAR is used to obtain the 3D pedestrian data to increase the dimension of data. Euclidean clustering algorithm is applied to obtain 3D pedestrian detection boxes, which contain three primary parameters in this algorithm. It includes the threshold of clustering radius *R*, the minimum number of clustering points *P*min and the maximum number of clustering points *P*max. The flow chart of pedestrian point cloud detection based on Euclidean clustering algorithm is shown in Figure 4.

**Figure 4.** Flow chart of pedestrian point cloud detection based on Euclidean clustering algorithm.

In order to obtain 3D pedestrian data through Euclidean clustering, the threshold needs to be set first. Considering the sparse characteristic of cloud point data, different thresholds of clustering radius *R* are set according to the sparsity of point cloud at different distances. Details of the relationship between pedestrian distance *Lpeds* and cluster radius threshold *R* are listed in Table 1.

**Table 1.** Corresponding relationship between pedestrian distance and clustering radius threshold.


Then, the set of points to be searched in the point cloud space is defined as *P*. According to the features of *P*, we create a 3D k-dimensional tree (KD-Tree) object. A feature point *Pi* = (*xi*, *yi*, *zi*) *<sup>T</sup>* is selected from *P*, its nearest neighbor points *Pn* = *Pj*, *j* = 1, 2, . . . , *n* are searched and the Euclidean distance *Dij* between points *Pi* and *Pj* is calculated through K-nearest neighbor (KNN) algorithm. The Euclidean distance calculation function is represented as Equation (1):

$$D\_{ij} = \sqrt{(x\_i - x\_j)^2 + (y\_i - y\_j)^2 + (z\_i - z\_j)^2} \tag{1}$$

where *xi*, *yi* and *zi* indicate the coordinates of the *i*-th point; *xj*, *yj* and *zj* indicate the coordinates of the *j*-th point. Once the Euclidean distance *Dij* is less than the threshold of clustering radius *R*, its corresponding point *Pj* is classified as a clustered point cloud set *Q*. According to the relationship between its Euclidean distance and the threshold, determine whether the point belongs to the clustered point cloud set *Q*.

Finally, according to the preset minimum number of clustering points *P*min and maximum number of clustering points *P*max, we judge whether the clustered point cloud set *Q* can be used as a valid single pedestrian target. If the number of point clouds in *Q* is greater than *P*min and less than *P*max, the cluster point cloud set *Q* is valid. Meanwhile, a pedestrian clustering result is generated and the 3D pedestrian data are obtained.

#### *3.2. Crowd Density Estimation*

At present, the crowd aggregation is an important factor causing cross-infection. Crowd aggregation is related to the crowd density of a scenario. In order to locate crowded scenarios accurately, the 2D and 3D pedestrian data obtained in the previous section are combined to improve the accuracy of the estimation algorithm. A multi-layer fusion method is designed for crowd density estimation, which consists of three parts: (1) pedestrian detection based on multi-sensor fusion, (2) crowd target detection based on foreground image and (3) crowd density estimation based on sub-area density.

#### 3.2.1. Pedestrian Detection Based on Multi-Sensor Fusion

After acquiring the 2D and 3D pedestrian data, multi-source data of pedestrians are fused based on the confidence optimization. The confidence of the pedestrians, applied in the YOLO V4 algorithm, has a great influence on the algorithm's detection results for pedestrians. The value of confidence is related to the constraints of pedestrian features in the YOLO V4 algorithm. The increase in confidence indicates that the constraints of the current pedestrian are more stringent, which will reduce FP rate. However, due to strict constraints, pedestrians with no obvious features may be missed by the algorithm. Therefore, in order to achieve accurate detection of long-distance and occluded pedestrians, the pedestrian confidence probability of the YOLO V4 algorithm is corrected by the Euclidean clustering

algorithm. The flow chart of multi-sensor data fusion based on confidence optimization is shown in Figure 5.

**Figure 5.** Flow chart of multi-sensor data fusion based on confidence optimization.

Firstly, the preprocessing of the fusion algorithm is achieved based on Zhang calibration [33] method and timestamp matching; secondly, the point cloud with the pixel data is matched through the R-Tree algorithm and the *IoU* between the bounding box and the 2D image detection box is calculated to determine the numerical relationship between the value of the *IoU* and the threshold. Finally, through the activation function Softmax(), the confidence probability of pedestrians in the image detection algorithm is corrected, which realizes the fusion of multi-sensor data. The fusion results can be used as a data source for crowd density estimation.

#### (1) Multi-sensor data preprocessing

The 2D and 3D detection boxes are input into the preprocessing part once the fusion algorithm is starting. Through Zhang calibration method, we obtain the relationship between the pixel coordinate system and the point cloud coordinate system. The coordinate system calibration of the multi-source sensors is shown in Figure 6. The four dots in different colors represent the vertices of the calibration board, and the lines in different colors represent the internal parameters of the camera and the degree of distortion. Through the calculation of the rotation vector and the translation vector, the matching of the image and the point cloud is achieved. The matching of timestamp is resolved by the Time Synchronizer mechanism. Since the point cloud sampling rate is 10 Hz, the image sampling rate is 25 Hz. Therefore, we add timestamps to each frame of data while collecting data from LiDAR and camera and call back the data with the same timestamp.

(2) Point cloud data and pixel data matching

**Figure 6.** The coordinate system calibration of the multi-source sensors.

The R-Tree algorithm is used to associate and match the detection box of the point cloud and the image to project the point cloud data to the pixel plane. According to the *IoU* of the image detection result and the projection result, the specific category information of the detected target in the point cloud space can be determined since there are several detection boxes, respectively, in each frame of image and point cloud data. We select a separate frame for matching each time, and the steps of the fusion algorithm are as follows:

Step 1: A frame of image data and point cloud data, including *m* 3D detection boxes and *n* 2D detection boxes, are input into the algorithm;

Step 2: The *i*-th 3D detection box and the *j*-th 2D detection box are selected. We set the initial values of parameters *i* and *j* to 1;

Step 3: The points in the *i*-th 3D detection box are projected onto the image to generate a 2D bounding box *ii*;

Step 4: The *IoU* of the 2D bounding box *ii* and the *j*-th 2D detection box are calculated through the algorithm. Through comparing the value of *IoU* of the two boxes and the threshold, we can determine the category information of the detection result.

*IoU* value is calculated as shown in Equations (2)–(4):

$$\text{intersection } = \left( \min \left( \mathbf{x}\_{ii}^{a\_2}, \mathbf{x}\_{j}^{b\_2} \right) - \max \left( \mathbf{x}\_{ii}^{a\_1}, \mathbf{x}\_{j}^{b\_1} \right) \right) \times \left( \min \left( y\_{ii}^{a\_2}, y\_{j}^{b\_2} \right) - \max \left( y\_{ii}^{a\_1}, y\_{j}^{b\_1} \right) \right) \tag{2}$$

$$union = S\_{ii} + S\_j - intersection \tag{3}$$

$$IoI = \frac{intersection}{union} \tag{4}$$

where *intersection* indicates the intersection between the bounding box and the 2D detection box, (*xa*<sup>1</sup> *ii* , *<sup>y</sup>a*<sup>1</sup> *ii* ) indicates the coordinates of the upper left corner of the bounding box, (*xa*<sup>2</sup> *ii* , *<sup>y</sup>a*<sup>2</sup> *ii* ) indicates the coordinates of the lower right corner, (*xb*<sup>1</sup> *<sup>j</sup>* , *y b*1 *<sup>j</sup>* ) indicates the coordinates of the upper left corner of the 2D detection box, (*xb*<sup>2</sup> *<sup>j</sup>* , *y b*2 *<sup>j</sup>* ) indicates the coordinates of the lower right corner, *union* indicates the union between two 2D detection boxes, *Sii* indicates the area of the point cloud detection box after mapping and *Sj* indicates the area of the 2D detection box.

#### (3) Pedestrian confidence correction

The fusion threshold of the intersection ratio is set to 0.75 [34], and the value of the *IoU* is compared with the fusion threshold to judge whether the current bounding box represents the pedestrian target. If the *IoU* is less than the fusion threshold, the label of the 2D bounding box is updated to be non-pedestrian; if the current *IoU* is greater than the fusion threshold and the clustering detection result is non-pedestrian, the confidence probability of the target is increased based on activation function Softmax(). Through adjusting the pixel and point cloud weight values, the correction effect is achieved. The confidence probability correction function is represented as Equation (5):

$$P = \max\left(Softmax\left(\mathcal{W}\_{LiDAR'} \mathcal{W}\_{vission}\right) = \frac{e^{\mathcal{W}\_{LiDAR}}}{\sum\_{i=1}^{n} e^{\mathcal{W}\_{i}}}\right) \tag{5}$$

where *P* indicates the value of probability correction, *Wi* indicates the value of *i*-th sensor weight, where *i* includes vision and LiDAR, and *n* indicates the number of sensors.

#### 3.2.2. Crowd Target Detection Based on Foreground Image

Crowd aggregation is one of the sources of virus spread, and current body temperature detection without distance restriction is difficult. Therefore, it is necessary to detect the aggregated crowd first and then detect the temperature of pedestrians in real time. Camera mounted on automated patrol vehicle can only acquire foreground image data. Due to occlusion and lack of depth information, the accuracy of detecting crowd targets based on foreground images cannot satisfy the patrol requirements. The 3D point cloud data has advantages in spatial sensing of the environment. Since the crowd target consists of multiple single pedestrians, we select the Euclidean clustering algorithm to detect the crowd target. Therefore, based on LiDAR and camera, perspective transformation algorithm is designed. The structure of perspective transformation is presented in Figure 7.

**Figure 7.** Structure of perspective transformation.

Through perspective transformation, the pedestrian detection box in the 2D image is mapped to a point in the top view. The perspective transformation function is represented as Equation (6):

$$
\begin{bmatrix} \mathbf{Y}\_i \\ \mathbf{1} \end{bmatrix}\_{\left(i+1\right)\times 1} = \begin{bmatrix} A\_{2\times 2} & t\_{2\times 1} \\ \mathbf{0}\_{1\times 2} & E\_{1\times 1} \end{bmatrix} \begin{bmatrix} X\_i \\ \mathbf{1} \end{bmatrix}\_{\left(i+1\right)\times i} \tag{6}
$$

where *Yi* takes the corner coordinate values of the three groups of detection boxes after transformation, *Xi* takes the corner coordinate values of the three groups of detection boxes before the transformation and *A* represents the matrix of the transformation.

Combined with the point cloud data, we assign the coordinate information of each target to the mapped pedestrian target points. The point set is clustered by the Euclidean clustering algorithm. The KNN algorithm based on KD-Tree object is an important preprocessing method of Euclidean clustering algorithm. Due to the fact that the errors in the parameter estimation process can be avoided by this algorithm and this algorithm has high classification accuracy, KD-Tree is used to traverse the adjacent pedestrian targets and

obtain the minimum distance between pedestrians. The algorithm flow of the minimum distance calculation between pedestrians in 3D space is shown as follows:

Step 1: The pedestrian target (3D target recognition box, confidence degree) obtained by the data fusion algorithm is used as the input of the algorithm;

Step 2: The centroid coordinates of each pedestrian target (*XPeds*,*YPeds*, *ZPeds*) are extracted by the algorithm, and the dimension of the current pedestrian centroid data *n* is calculated as the set;

Step 3: The dimension *d* with the largest variance and the median *m* of all data items on dimension *d* are found. The dataset is divided according to the median *m*, which is divided into two parts, and the two data subsets are recorded as *Dl* and *Dr*, respectively;

Step 4: A pedestrian centroid KD-Tree node is established to store the division dimension *d* and the median *m*. Step 2 is performed again on the data subsets *Dl* and *Dr* divided by the Step 3, and the newly generated centroid KD-Tree node is set as the left and right child nodes;

Step 5: The dataset is continuously divided until the pedestrian centroid data subset contains less than two pieces of data after the current division. The corresponding centroid data are saved to the last leaf node, which is the KD-Tree object of the pedestrian centroid;

Step 6: Each pedestrian target is traversed by the KNN algorithm to find the current pedestrian and the nearest pedestrian. The Euclidean distance between the current pedestrian and the closest pedestrian is calculated through KNN algorithm;

Step 7: Pedestrian minimum distance sequence is obtained by minimum distance calculation algorithm.

The clustering threshold is one of the most important parameters in Euclidean clustering algorithm, and it is necessary to be set reasonably. Combined with the minimum distance sequence of pedestrians, which are obtained by the above algorithm, the distance features of pedestrians are distinguished. Due to the change in the objective of clustering, the thresholds are set according to the following two clustering conditions existing in the work scenario of the automated patrol vehicle;

Condition 1: There is a long distance between the center point and each clustering point, as shown in Figure 8a;

**Figure 8.** Detection results of crowd target. (**a**) Euclidean cluster in Condition 1; (**b**) Euclidean cluster in Condition 2.

Condition 2: The center of the cluster point is in the scenario with high crowd density, as shown in Figure 8b.

For Condition 1, the centroid point cloud of each pedestrian is used by the clustering algorithm to represent the current pedestrian target, and the centroid point cloud of all pedestrians is classified into a new clustering space. Through the experiments in Section 4, the clustering threshold is selected as 0.5 m and the pedestrian centroid point cloud is clustered. The requirements of crowd target detection are satisfied. For Condition 2, areas with high density are regarded as complete crowd targets by the clustering algorithm to achieve crowd detection.

As shown in Figure 8, a single black point in the scenario represents the pedestrian target detected by the algorithm, and the red point represents the non-pedestrian target. Firstly, the non-pedestrian target is eliminated through the algorithm. Then, two adjacent pedestrian targets are clustered based on the KNN algorithm. The clustered target is used as a new individual to cluster again until there is no single pedestrian target in the scenario. Finally, crowd targets are obtained as the algorithm output.

#### 3.2.3. Crowd Density Estimated Based on Sub-Area Density

In order to improve the effect of density estimation, a crowd density estimation algorithm based on sub-area density is proposed to expand the difference in the distribution characteristics of crowd density. The layer is divided into equal-area units by the algorithm, and the density values of the units are selected as input of the algorithm. Through fusing crowd density data from multiple layers, the crowd density distribution is calculated. Based on the weighted mean theory, the crowd density of the current scenario is estimated.

According to the number of pedestrian targets detected by the automated patrol vehicle in the current scenario, the layer with a side length of 15 m is divided into 32 × 32 units by algorithm. Each of these units covers an area of 0.22 square meters, which fits the minimum footprint of an adult. The density value of unit is taken from the ratio of the number of pedestrians in each unit to its area of unit, whose calculation is shown as Equation (7):

$$\left\{\rho\_{(i,\vec{j})}|\rho\_{(i,\vec{j})} = \frac{N\_{\vec{i}\vec{j}}}{S}, i, j \in 50 \right\} \tag{7}$$

where *ρ*(*i*,*j*) indicates the density value of unit, *Nij* indicates the number of pedestrians in the unit of the *i*-th row and the *j*-th column and *S* indicates the area of a single unit within the layer. The calculation algorithm of *Nij* is illustrated in Algorithm 1.


The initial crowd density distribution matrix *Mold* is composed of the density values of unit *ρ*(*i*,*j*). The numerical values in each row and column of the matrix are the density values of unit. The initial crowd density distribution matrix is represented as Equation (8):

$$M\_{old} = \begin{bmatrix} \rho(1,1) & \rho(1,2) & \dots & \rho(1,j) \\ \rho(2,1) & \ddots & \rho(2,j-1) & \rho(2,j) \\ \vdots & \rho(i-1,2) & \ddots & \rho(i-1,j) \\ \rho(i,1) & \rho(i,2) & \dots & \rho(i,j) \end{bmatrix}, i, j \in 50 \tag{8}$$

The initial crowd density distribution matrix is represented by the current density of each individual point. However, density is one of the attributes of the crowd; the arrangement of density values of individual points is not representative. Therefore, the density features of adjacent units in the layer are correlated by this algorithm. After the corresponding unit density values of different layers are linearly combined, the unit density values containing the features of adjacent units are obtained. The matrix *Mnew*, which is composed of each unit density value containing adjacent unit features in the layer, is the final crowd density distribution matrix. The structure of crowd density distribution detection based on the multi-layer fusion algorithm is shown in Figure 9.

**Figure 9.** Structure of crowd density distribution detection based on the multi-layer fusion algorithm.

A 2D coordinate system is established according to the initial position of the automated patrol vehicle in Section 3.2.1. Along the initial direction of the automated patrol vehicle, the 2 × 2 units are sequentially taken to form a new unit in the next layer, a new layer of 16 × 16. The side length of any 2 × 2 unit in the current layer is 0.9 m. The pedestrian density value in the 2 × 2 unit is counted and superimposed with the density value of the unit in upper layer. The superposition algorithm is represented as Equation (9):

$$\rho\_{(i,j)}^k = \sum\_{i=1, j=1}^2 \rho\_{(i,j)}^{k-1} \tag{9}$$

where *ρ<sup>k</sup>* (*i*,*j*) indicates the density value of the unit in the *i*-th row and the *j*-th column after the *k*-th division, and *ρk*−<sup>1</sup> (*i*,*j*) indicates the density value of the unit of the *i*-th row and the *j*-th column after the (*k* − 1)-th division.

It is worth mentioning that the total area of the single layer does not change after the layer is divided by the above algorithm. The 2 × 2 units are finally iterated by the algorithm. We count the density values of the units after each division, and, based on the fusion of multi-layer by weighting, the density distribution matrix *Mnew* is obtained as the output of the algorithm. The weighted superposition is represented as Equation (10):

$$M\_{n\to n} = \begin{bmatrix} \sum\_{k=0}^{n} w\_k \cdot \rho\_{(1,1)}^k & \sum\_{k=0}^{n} w\_k \cdot \rho\_{(1,2)}^k & \dots & \sum\_{k=0}^{n} w\_k \cdot \rho\_{(1,j)}^k\\ \sum\_{k=0}^{n} w\_k \cdot \rho\_{(2,1)}^k & \ddots & \sum\_{k=0}^{n} w\_k \cdot \rho\_{(2,j-1)}^k & \sum\_{k=0}^{n} w\_k \cdot \rho\_{(2,j)}^k\\ \vdots & \sum\_{k=0}^{n} w\_k \cdot \rho\_{(i-1,2)}^k & \ddots & \sum\_{k=0}^{n} w\_k \cdot \rho\_{(i-1,j)}^k\\ \sum\_{k=0}^{n} w\_k \cdot \rho\_{(i,1)}^k & \sum\_{k=0}^{n} w\_k \cdot \rho\_{(i,2)}^k & \dots & \sum\_{k=0}^{n} w\_k \cdot \rho\_{(i,j)}^k \end{bmatrix}, i, j \in \mkern-10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000$$

where *wk* indicates weight of the *k*-th division. In order to improve the detection efficiency, the harmonic series is used as the weight update function to magnify the difference between the crowd density in different units. The weight update function is represented as Equation (11):

$$w\_k = \frac{1}{k+1}, k \in [0, N] \tag{11}$$

where *k* indicates the current superposition times and *N* indicates the total number of divisions.

Real-time detection in disease control areas is very important. Due to the multidimensional nature of crowd features, the weighted mean theory used in the algorithm should have high real-time performance and the density estimation should satisfy the requirements of patrolling. Therefore, we combine the comparative requirements of the crowd density of different layers, and, based on the weighted mean theory, estimate the crowd density in the current scenario. The estimation equation is represented as Equation (12):

$$\rho\_{crowd} = \frac{\sum\_{i} \sum\_{j} \sum\_{k=0}^{n} w\_k \cdot \rho\_{(i,j)}^k}{\iint d\sigma}, i, j \in 50 \tag{12}$$

where *ρcrowd* indicates the crowd density and *σ* indicates the overall area of the layer.

#### *3.3. Pedestrian Body Temperature Detection*

At present, more than 40% of the patients that test positive for the coronavirus have high temperature features in their clinical manifestations [35], and the risk of current epidemic spreading can be decided by temperature. Therefore, a body temperature detection algorithm based on thermal imaging is designed after the density estimation to ensure the risk of cross-infection can be reduced by temperature detection with high crowd density. Since the accuracy limitations of current temperature sensors, it is difficult to detect temperature at any distance accurately. Therefore, in order to reduce the cost of temperature detection and improve the accuracy, the distance from the sensor to the target to be detected is fixed at 3 m, and an external bold body, an empirical method and outlier filtering are used to correct the detection results. The flow chart of pedestrian body temperature detection is shown in Figure 10.

**Figure 10.** Flow chart of pedestrian body temperature detection.

3.3.1. Infrared Image and Visible Image Registration

In order to combine image data after fusion and infrared data, the registration method is used to fuse multi-sensor data. Color image data is output by the infrared camera in real time, in which each pixel represents the temperature value. Through detecting the temperature value of the face area, we calculate the current pedestrian's body temperature. However, since the angle and orientation of the output images of infrared and visible images to the same target are different, it is necessary to register the infrared image and the visible image. Several matching points in the infrared and visible images are selected, and we register the two images by means of affine transformation. Affine transformation function is shown in Equation (13):

$$
\begin{bmatrix} \mathbf{x}\_{i}^{\prime} \\ \mathbf{y}\_{i}^{\prime} \end{bmatrix} = A \begin{bmatrix} \mathbf{x}\_{i} \\ \mathbf{y}\_{i} \end{bmatrix} + BI \tag{13}
$$

where *A* and *B* indicates the parameters of affine transformation equation, and (*xi*, *yi*) and (*xi* , *yi* ) indicate a set of corresponding coordinates in visible and infrared images. The equation of obtaining parameters *A* and *B* of affine transformation equation is shown in Equation (14):

$$\begin{cases} \quad A = \begin{bmatrix} a\_{11} & a\_{12} \\ a\_{21} & a\_{22} \end{bmatrix} \\ \quad B = \begin{bmatrix} b\_{1} \\ b\_{2} \end{bmatrix} \end{cases} \tag{14}$$

In order to determine the two parameters in the process of image registration in the above equation, different corners of the detection screen are obtained by the automated patrol vehicle in real time. When the number of extracted corners is more than three groups, the optimal parameters of affine transformation equation can be fitted by the least square method or gradient descent method. Since the least square method can be used to find the global solution instead of the local optimal solution in the gradient descent method. Therefore, multiple matched corners are selected to fitting the optimal solution based on the least square method. Infrared image and visible image are registered by the equation after obtaining the parameters of affine transformation equation.

#### 3.3.2. Human Body Temperature Detection and Correction

The corresponding point can be found both in the infrared image and the visible image after the registration, which can establish the mapping relationship to match the face area. Considering the accuracy of the infrared sensor, we set the distance of the temperature detection between 3–8 m. However, there are temperature interference factors such as hats, glasses and hair in the actual scenario, which leads to its temperature lower than the real value. Therefore, through the 5 × 5 template in the region of interest (ROI), the mode index of temperature of each template in the current scenario is calculated by the algorithm. The template is averaged as the pedestrian's body temperature. The preset template is used to traverse the face area after the face target is detected in the dense area. And temperature data are extracted for each pixel within the template. The temperature of pedestrian calculation method is represented as Equation (15):

$$T\_i = \frac{\sum\_{j=1}^{n} \mathcal{M}(j)}{n} \tag{15}$$

where *Ti* indicates the temperature of *i*-th pedestrian, *n* indicates the number of face detection box pixels and *M*(*j*) indicates the mode of infrared detection temperature in a single 5 × 5 template.

Meanwhile, hardware errors are still inevitable, and the algorithm detection results are quite different from the values detected by the thermometer. Considering that the most accurate body temperature detection is the thermometer, it is reasonable to use it as the standard. Therefore, we combined the algorithm detection results with the thermometer detection results to correct the body temperature detection algorithm by empirical method. The temperature deviation caused by the occlusion of glasses and other facial decoration can be reduced effectively. And the local abnormal temperature caused by the sensor

distance accuracy can be reduced by taking the mean value. The correction algorithm is shown as Equation (16):

$$T\_p = T\_{f\_-d} + \sqrt{\frac{T\_{f\_-d}^2}{T\_{f\_-d}^2 + \left(T\_{f\_-d} - T\_{env}\right)^2}} \times \left(T\_t - \frac{\sum\_{i=1}^n T\_i}{n}\right) \tag{16}$$

where *Tp* indicates the detection temperature obtained after correction, *Tf* \_*<sup>d</sup>* indicates the result of infrared temperature measurement, *Tenv* indicates the current environmental temperature, *Tt* indicates the current pedestrian's body temperature measured by a thermometer and *n* indicates the number of selected measurements.

In addition, errors in the temperature detection are mainly due to temperature drift caused by environmental changes and long-term operation. The probe of an infrared detector is exposed in the infrared radiation, including air temperature and radiation generated by the surrounding environment. Due to changes in the environment, the state of infrared radiation is changed, which will reduce the accuracy of temperature detection. The long-term work of automated patrol vehicles can also cause temperature drift. Therefore, we adopt external bold body correction and outlier filtering to further improve the accuracy of temperature detection. By setting the external bold body in front of the infrared camera, the error of temperature detection can be reduced and the stability of the system can be improved. The external bold body correction coefficient function is represented as Equation (17):

$$K = \frac{T\_0}{T\_a} \tag{17}$$

where *K* indicates the temperature correction coefficient, *T*<sup>0</sup> indicates the absolute temperature set by the external bold body and *Ta* indicates the absolute temperature detected by the external bold body. The external bold body is a constant temperature source, whose temperature *T*<sup>0</sup> is fixed at 37◦.

Moreover, outliers are usually caused by pedestrian occlusion, face detection errors, environmental temperature changes and other factors in the process of body temperature detection. In order to ensure the accuracy of temperature detection, temperature detection values of less than 35◦ and greater than 42◦ are removed with reference to the normal body temperature of pedestrians.

#### **4. Experimental Results and Discussion**

In this section, a detection platform and scenario are constructed based on the automated patrol vehicle. Considering the current situation of the pandemic, a straight road in the campus is chosen for the experiment. There are entrances for pedestrians on the north and south sides of the straight road. There are convenience stores and other factors at the end of the east side that cause pedestrian aggregation, as shown in Figure 11a. For this scenario, we design an automated patrol vehicle applied with a detection method. The automated patrol vehicle is equipped with a mechanical 16-line LiDAR, a 720P USB industrial camera, a thermal infrared camera, an Ackerman vehicle chassis, an industrial computer and a monitor screen, as shown in Figure 11b. The measuring range of the mechanical LiDAR does not exceed 150 m, and the minimum measuring range is 0.5 m. It is mounted perpendicular to the vehicle at 0.5 m. The camera is mounted perpendicular to the LiDAR at 0.4 m directly below. The thermal infrared camera is mounted perpendicular to the camera at 0.6 m directly below. The maximum velocity of the automated patrol vehicle is 4 m per second, which satisfies the security requirements of patrolling on campus.

**Figure 11.** Experimental platform based on automated patrol vehicle. (**a**) Experimental scenario. (**b**) Automated patrol vehicle.

Then, the feasibility and accuracy of pedestrian data acquisition, crowd density estimation and real-time body temperature detection are analyzed and discussed.

The experiments of pedestrian data acquisition consist of two parts: (1) feasibility of pedestrian data acquisition; (2) pedestrian detection accuracy verification. The 2D pedestrian data acquisition results using the YOLO V4 algorithm are shown in Figure 12a, and the rectangular boxes are the detection results of pedestrians. The 3D pedestrian data acquisition results using the European clustering algorithm are shown in Figure 12b, and the different colored stereoscopic boxes are the detection results of pedestrians. The image data after data fusion are shown in Figure 12c. The red dots represent the pedestrian detection results based on YOLO V4, and the green dots represent the pedestrian detection results based on the Euclidean clustering algorithm. The results of face detection based on the YOLO V4 network trained by Celeba are shown in Figure 12d.

**Figure 12.** The results of pedestrian detection and face detection. (**a**) 2D detction box. (**b**) 3D detection box. (**c**) Fusion results. (**d**) Face detection results.

As shown in Figure 12, occluded pedestrians can be detected by the trained YOLO V4 network better. Long-distance pedestrians can be detected by the clustering algorithm, and the location information of pedestrians can be obtained. Combined with the multi-sensor features, the detection range and feasibility of the fusion algorithm are higher than those of a single sensor. As shown in Figure 12d, the face targets can be detected by a trained YOLO V4 network accurately.

The FP rate index is used to verify the accuracy of the pedestrian detection algorithm under different pedestrian occlusion rates and different scenarios. Considering the moving velocity of the automated patrol vehicle and the sensing range of the sensor, we fixed the distance between the automated patrol vehicle and the pedestrian target at 10 m. The academic building and outer space are used as the respective experimental scenarios to verify the effects of different lighting and environmental complexity on the detection of pedestrians. In order to verify the detection accuracy after data fusion, the occlusion of pedestrians is defined into four levels [36], including no occlusion for pedestrians, partial occlusion for pedestrians, severe occlusion for pedestrians and complete occlusion for pedestrians. The fitting curves of the false positive rates under different scenarios are shown in Figure 13.

**Figure 13.** Fitting curves of false positive rate under different scenarios. (**a**) Outdoor environment. (**b**) Academic building.

The lower the FP rate of pedestrian targets, the better the detection effect of the fusion algorithm. With the increase in the pedestrian occlusion rate, the FP rate of pedestrian targets will gradually increase. The FP rate of the fusion algorithm is lower than 5% when the pedestrian occlusion rate is below 20% in the academic building and outdoor environment, as shown in Figure 13. Therefore, the fusion algorithm has higher accuracy and stronger robustness compared to the single-sensor detection algorithm. The accuracy of the pedestrian detection method based on the fusion algorithm satisfies the requirements of the crowd density estimation algorithm.

Moreover, we evaluate the effectiveness of crowd density estimation in experimental scenarios through a three-part experiment, which includes: (1) crowd clustering threshold selection and crowd detection implementation; (2) crowd density distribution verification; (3) the accuracy of crowd density estimation verification.

The clustering effect of the crowd target is directly related to the clustering threshold. In order to obtain the higher clustering results, a suitable threshold is very important. Therefore, we conduct experiments on the selection of clustering thresholds. The result of the crowd clustering detection with the clustering threshold set to 0.5 is shown in Figure 14. In Figure 14a, the red dots represent the pedestrian detection results based on YOLO V4, and the green dots represent the pedestrian detection results based on the Euclidean

clustering algorithm. And in Figure 14b, different colored stereoscopic boxes represent the detection results of crowd targets.

**Figure 14.** Detection results of crowd target when the clustering threshold is set to 0.5. (**a**) The crowd aggregation state. (**b**) The detection results of crowd target.

The larger the clustering threshold is, the more blurred the boundary of the crowd target is, as shown in Figure 14. Additionally, the algorithm is more inclined to cluster the scattered points into the detection result, which does not satisfy our requirements. If a small clustering threshold is selected, the number of clustered crowd objects will be very large and will lead to a smaller number of pedestrians contained in a single cluster. The pedestrian targets that cause body temperature to be detected will be reduced, resulting in an increased probability of infection risk. Therefore, we take the clustering threshold as 0.5 m according to the comprehensive judgment of the correlation between the threshold and the number of clusters. The detection results of the crowd target clustering satisfy our requirements under this clustering threshold.

The number of pedestrians in the unit will be detected by the automated patrol vehicle. Additionally, combined with the area of unit, the crowd density in the unit will be estimated. We use the 3D thermodynamic chart to show the current distribution of the crowd density in the scenario, as shown in Figure 15. In order to show that our method has better performance, the initial density distribution matrix is used for comparison.

**Figure 15.** Visualizations of crowd density distribution detection results. (**a**) Initial crowd density distribution *Mold*. (**b**) Final crowd density distribution *Mnew*.

Crowd density is proportional to its thermal peak. Columns with low density values are represented in blue, and the color of the columns transfers to yellow as the density increases. The initial density distribution matrix *Mold* is shown in Figure 15a, and the final density distribution matrix *Mnew* calculated by our algorithm is shown in Figure 15b. In *Mold*, the aggregation area is not prominent, and the density distribution is uniform. As shown in Figure 15b, there are three peaks in the current scenario. Each peak area represents a group of crowd aggregation. The detection result is consistent with the result of the manual measurement of density distribution. The crowd density in the scenario is estimated based on this distribution. We compare the detection results with the real values measured manually to obtain the density error, as shown in Table 2.


**Table 2.** Analysis of the accuracy of crowd density estimation.

Five crowd groups were preset in the experimental scenario and recorded as five groups of scenarios. The larger the scenario number, the more pedestrians in a single scenario. As the number of pedestrians increases, the value of crowd density increases gradually, as shown in Table 2. The density error is the largest when the scenario number is 4 and the smallest when the scenario number is 1. Since three or more pedestrians are completely occluded in the crowd represented by scenario number 4, the error is the largest. The experimental results have shown that the error of the proposed algorithm in detecting crowd density is 6.54% at maximum and 1.11% at minimum. The detection accuracy satisfies the requirements of crowd density estimation.

The feasibility of the body temperature detection algorithm is verified through twopart experiments, which include: (1) real-time body temperature detection verification; (2) body temperature detection accuracy verification. The body temperature of pedestrians in real time is detected by the automated patrol vehicle. The temperature detection results are displayed visually, as shown in Figure 16.

**Figure 16.** Results of temperature detection. (**a**) Original image. (**b**) Face detection. (**c**) Temperature detection.

During the experiment, the air temperature is 21.4◦. As shown in Figure 16, the real-time body temperature of the current pedestrian can be detected by the automated patrol vehicle, and it can be marked on the pedestrian's body accurately without abnormal values. Therefore, the proposed algorithm is feasible. Then, the accuracy of our algorithm in detecting pedestrian body temperature is verified through comparative experiments. Considering that the temperature of the same object detected by different devices at different times is not the same, the current results measured by thermometers are highly convincing. Therefore, we select the data measured by the thermometer as the accurate body temperature, and the detection error of the body temperature is calculated through comparing with the detection results, as shown in Figures 17–19.

**Figure 17.** Comparison of pedestrian body temperature detection results at a distance of 3 m. (**a**) Temperature detection results. (**b**) Detection error.

**Figure 18.** Comparison of pedestrian body temperature detection results at a distance of 4 m. (**a**) Temperature detection results. (**b**) Detection error.

**Figure 19.** Comparison of pedestrian body temperature detection results at a distance of 5 m. (**a**) Temperature detection results. (**b**) Detection error.

The detection distance is fixed at 3, 4 and 5 m. The interval is set to three seconds, and we select ten consecutive body temperature detection data to compare with the detection results of thermometer. Due to the position of the external bold body and the correction algorithm, the detection of pedestrian body temperature at 3 m is the most accurate, as shown in Figure 17. Comparing the detection results with the body temperature data measured by the thermometer, the absolute value of the error is less than 0.8◦ in ten records. Considering the safety factors on campus, the automated patrol vehicle has to maintain a reasonable distance from pedestrians. Therefore, the distance between the automated patrol vehicle and the crowd target can be kept at about 3 m, and the body temperature of pedestrians can be detected stably and with high precision at this distance.

The feasibility and accuracy of the pedestrian data acquisition algorithm, the optimal threshold for crowd target clustering, the density estimation and the accuracy of body temperature detection are verified, in turn, after the discussion of the above experiments. The experimental results have shown that the method can work stably in the campus environment.

#### **5. Conclusions**

In this paper, in order to reduce the risk of cross-infection and improve the accuracy of sensing, we proposed a novel multi-sensor data fusion method, a multi-layer fusion method and a temperature detection method. Meanwhile, the above methods were applied in an automated patrol vehicle, which improves the flexibility of detection. The experimental results demonstrated that the pedestrian detection results are improved by more than 17% compared to the single-sensor algorithm; the density estimation results are more intuitive after applying the multi-layer fusion method. Compared with the thermometer, the error in body temperature detection is less than 0.8◦. Due to the limitations of the equipment cost and the volume of the automated patrol vehicle, the detection effect of the proposed method on the lateral and rear targets of the vehicle body needs to be improved, which can be solved by increasing the number of sensors in the future.

**Author Contributions:** Conceptualization, P.W.; methodology, C.L.; software, C.L. and H.Y.; validation, Y.W. and H.Y.; formal analysis, Y.W.; investigation, P.W.; resources, P.W.; data curation, C.L.; writing—original draft preparation, C.L.; writing—review and editing, P.W.; visualization, C.L.; supervision, P.W.; project administration, P.W.; funding acquisition, P.W. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Beijing Natural Science Foundation (grant number 4212034).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** All data and models used during the study appear in this article.

**Acknowledgments:** The authors would like to thank X. Liu and X. Wang for their technical assistance with the experiments and analysis.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **MALS-Net: A Multi-Head Attention-Based LSTM Sequence-to-Sequence Network for Socio-Temporal Interaction Modelling and Trajectory Prediction**

**Fuad Hasan and Hailong Huang \***

Department of Aeronautical and Aviation Engineering, The Hong Kong Polytechnic University, Hong Kong, China

**\*** Correspondence: hailong.huang@polyu.edu.hk

**Abstract:** Predicting the trajectories of surrounding vehicles is an essential task in autonomous driving, especially in a highway setting, where minor deviations in motion can cause serious road accidents. The future trajectory prediction is often not only based on historical trajectories but also on a representation of the interaction between neighbouring vehicles. Current state-of-the-art methods have extensively utilized RNNs, CNNs and GNNs to model this interaction and predict future trajectories, relying on a very popular dataset known as NGSIM, which, however, has been criticized for being noisy and prone to overfitting issues. Moreover, transformers, which gained popularity from their benchmark performance in various NLP tasks, have hardly been explored in this problem, presumably due to the accumulative errors in their autoregressive decoding nature of time-series forecasting. Therefore, we propose MALS-Net, a Multi-Head Attention-based LSTM Sequence-to-Sequence model that makes use of the transformer's mechanism without suffering from accumulative errors by utilizing an attention-based LSTM encoder-decoder architecture. The proposed model was then evaluated in BLVD, a more practical dataset without the overfitting issue of NGSIM. Compared to other relevant approaches, our model exhibits state-of-the-art performance for both short and long-term prediction.

**Keywords:** vehicle trajectory prediction; autonomous driving; LSTM; transformer; multi-head attention

#### **1. Introduction**

Autonomous driving has been gaining an increasing interest due to its guarantee of safer driving on the road. Recently, one of the core functions of autonomous driving that has been of particular interest in the literature is the future trajectory prediction of neighboring vehicles. In a congested highway driving scenario, where all cars are often driving at very high speeds, even a mildly reckless maneuver can consequentially result in a ripple effect causing a serious accident, which is why an autonomous car needs to forecast the future trajectories of its surrounding vehicles to assess the risk of its own future maneuvers.

However, predicting the future trajectories of surrounding vehicles has been extremely challenging in practice since the prediction is not only dependent on the historical trajectories of the vehicles but also the dynamic and complicated socio-temporal interdependence [1] of the dense web of vehicular traffic around the car, including other cars, buses, trucks, etc. For instance, in a dense highway setting, if one driver tries to change the lane, the neighboring lane driver might slow down to give way and the current lane driver will speed up to take the driver's place. Hence, to make an accurate prediction of the future trajectory, a proper model of the interaction between the participants needs to be utilized as one of the parameters for the prediction besides only raw historical trajectories of the surrounding vehicles.

**Citation:** Hasan, F.; Huang, H. MALS-Net: A Multi-Head Attention-Based LSTM Sequence-to-Sequence Network for Socio-Temporal Interaction Modelling and Trajectory Prediction. *Sensors* **2023**, *23*, 530. https:// doi.org/10.3390/s23010530

Academic Editor: Arturo de la Escalera Hueso

Received: 16 November 2022 Revised: 12 December 2022 Accepted: 19 December 2022 Published: 3 January 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

To tackle this problem of complexity in modeling interaction, recent few years have seen the emergence of a plethora of deep learning-based data-driven models. To address the challenge of modeling the influence of neighboring vehicles on the target or ego vehicle, i.e. the social interaction, typical deep-learning-based operations that have been explored include various pooling techniques [2] with convolution [3] or graphs or both [4]. These have often been coupled with an RNN such as LSTM [5] or GRU [6] to exploit its inherent memory capability and extract temporal correlation. However, these models often struggle to model complex long-term temporal correlations. Transformers have been introduced to the literature with the promise of tackling the issue of long-term temporal correlation as well as parallelizing the decoding process. Inspired by its distinct attention mechanism, various attention-based techniques have been adopted in [1,7]. However, the multi-headed attention mechanism, originally proposed in the traditional transformer [8], has not extensively been explored in the highway trajectory prediction problem, mainly due to the problem of accumulative errors resulting from the autoregressive decoding procedure of transformers [9].

Moreover, most studies related to highway future trajectory prediction have mainly adopted the popular NGSIM dataset [10] to evaluate their performance. The NGSIM dataset was mainly collected to satisfy the need for data with explicit microscopic traffic flow and car-following information, for advancing traffic flow theory [11]. However, the NGSIM dataset was discovered to be extremely noisy in the literature by [12–14]. Various efforts have been made to smooth out the noise such as low pass filter, interpolation, etc. by [15]. Ref. [11] demonstrated that despite these efforts, the problem still persists and training on the NGSIM dataset thus has the potential possibility of resulting in a model with overfitted parameters, making it unfeasible to be deployed for practical application.

In this study, we thus propose a transformer-based LSTM sequence-to-sequence model to tackle the interaction-aware trajectory prediction problem. We exploit the main mechanism of the transformer, the multi-head attention (*MHA*), and implement it on an *MHA*-LSTM fused sequence-to-sequence architecture to tackle the autoregressive accumulative error problem of the transformer decoder. The encoder in our model is capable of encoding both the social and temporal interaction of the vehicles by implementing two Multi-Head Attention layers in the encoding process to put attention on both vehicle-vehicle and timestep-timestep graphs, followed by a traditional LSTM layer. This is then followed by another Multi-Head Attention-based decoder with an LSTM layer that is be able to decode by focusing on the socio-temporal interaction between the vehicles in successive decoding steps, which helps prevent the model from accumulating autoregressive decoding errors and also improve future trajectory prediction accuracy. We thus summarize the contribution and the academic as well as the practical novelty of our original work as follows.


The remainder of this article is constructed as follows. Section 2 provides an overview of the literature review on the general vehicle future trajectory prediction problem. Section 3 formulates the main trajectory prediction problem that we focus on and also provides an in-depth look on the model we propose. Section 4 mentions the implementation details and also reports our experimental results from both comparative and ablative studies. Finally, Section 5 concludes this paper and discusses future directions.

#### **2. Related Work**

The very first versions of the trajectory prediction literature focused on multiple physical model-based techniques. These essentially predict the future motion of a vehicle based on a form of kinematics, statistics or a fusion of both models. Some popular approaches include the Constant Velocity (CV) [16] and Constant Acceleration model (CA) [17], Kalman Filter-based motion model [18], Gaussian Mixture Model (GMM) [19] and Hidden Markov Model (HMM) [20]. This allowed [21] to utilize CV and CA to develop an intelligent driver model. Further works by [22] proposed a fusion of GMM with HMM to use parameters such as vehicle motion estimation, traffic patterns, and spatial interaction to predict the motion of the surrounding vehicles. The purpose of predicting the trajectories of surrounding or sometimes the ego vehicle has mainly been done to carry out risk assessments for safe automated driving, such as in [23,24], where Rapid Random Trees (RRT) and Linear Matrix Inequality (LMI) has often been implemented to analyze driving risk of lane change maneuvers or construct a human-machine shared control system [25]. However, it has been observed that physical models have a very short prediction horizon, beyond which the accuracy of the model becomes unfeasible [26]. Therefore, this has pushed the research toward the direction of data-driven methods.

The early works on data-driven models explored simpler learning-based models for interaction-aware trajectory prediction, such as support vector machines in [27,28] and Gaussian process regression [29]. However, these models need a handful of manually designed features to be completely constructed. As the incresing complexity of Spatio-temporal interdependence modelling was recognized, it became unfeasible to use handcrafted features and hyperparameters for these models. Thus deeper learning models, which can design and train their own features, proved to be a breakthrough.

At first, most deep learning models mainly focused on extracting temporal correlation via RNN architectures such as LSTM [30] and GRU [31], only focusing on the historical trajectory to make the future prediction [32,33]. Other deep-learning approaches adopted LSTMs to utilize their time-series modelling ability to make an energy-aware driving behavior analysis as well as predict the motion [34]. Due to the encoder-decoder sequenceto-sequence architecture of RNNs, they lacked the ability to model any sort of social or spatial interaction. More recent literature has thus mostly focused explicitly on coupling RNNs with some sort of social feature extraction architecture to model vehicle interaction. An approach adopted by [35] proposed a statistical method GMM fused with LSTMs to generate the leading-vehicle trajectory. Most popular such approaches, however, have mostly focused on other neural architectures such as CNNs and GNNs.

One of the widely used architectures to model social interaction has been the convolution and social pooling approach. One of the first social pooling approaches was proposed in [2] which used an LSTM along with a social pooling strategy (S-LSTM) to decode the prediction. In [36], a social GAN (SGAN) was used which utilizes both sequence-to-sequence architecture and a GAN to make the final prediction. A convolutional social pooling approach was proposed in [3] where the convolutional kernel-striding was applied to social pooling to improve on the LSTM social pooling approach in [2]. A CNN-LSTM approach was proposed in [37] which used an LSTM encoder-decoder architecture and performed a CNN operation on the hidden-layer tensors.

Graph-based architectures have also been frequently used for predicting future trajectories. Modelling the interaction-aware trajectory prediction of surrounding vehicles as graphs with nodes being the vehicles and the edges being the interaction between them has often been implemented in the literature. Such an approach was implemented in [6] which constructed a GNN to make the future trajectory prediction.

Attention-based networks, however, have been a recent breakthrough, because of their ability to rapidly extract important information from historical tracks. Ref. [38] applies a message-passing architecture to focus on pedestrian motion in order to make a future prediction. Another model which also focuses on pedestrian trajectory prediction is [39], wherein two attention layers are stacked to extract both spatial and temporal attention. Graphs have also been utilized in the attention mechanism. The reference [40] utilizes social graph attention to model both social and temporal interaction based on relative positions. However, the multi-head attention mechanism of the transformer does not appear to have been used as extensively in vehicle trajectory prediction. Its utilization mainly spans pedestrian trajectory prediction.

#### **3. Methodology**

#### *3.1. Problem Formulation*

In this paper, the future vehicle trajectory prediction problem is formulated as a nonlinear regression task where the inputs to the model are the past trajectories of the observed neighbouring vehicles, which can be represented by

$$X = \{p\_1, p\_2, \dots, p\_T\},\tag{1}$$

where

$$p\_t = \left\{ (\mathbf{x}\_t^0, y\_t^0), (\mathbf{x}\_t^1, y\_t^1), \dots, (\mathbf{x}\_t^N, y\_t^N) \right\}. \tag{2}$$

*pi <sup>t</sup>* = (*x<sup>i</sup> <sup>t</sup>*, *y<sup>i</sup> <sup>t</sup>*) are the coordinates *x* and *y* of vehicle *i* at timestep *t*, where *t* ∈ [1, *T*] and *i* ∈ [1, *N*], *T* stands for the total length of the observed trajectory and *N* is the total number of observed neighbouring vehicles within 30 meters of the target vehicle, i.e. ! ! !*p target <sup>t</sup>* <sup>−</sup> *<sup>p</sup><sup>i</sup> t* ! ! ! <sup>≤</sup> 30.

Based on the past trajectories, the objective of the proposed model is to learn a nonlinear regression function that predicts the coordinates of all observed vehicles in the prediction horizon *F*, such that the predicted trajectories of the neighbouring vehicles are represented as *Y*ˆ, as follows, which approximates the ground truth (GT) trajectory, *Y* = {*qT*+1, *qT*+2,..., *qT*+*F*}

$$\hat{Y} = \{\mathfrak{f}\_{T+1}, \mathfrak{f}\_{T+2}, \dots, \mathfrak{f}\_{T+F}\},\tag{3}$$

where

$$\mathfrak{q}\_t = \left\{ (\mathfrak{x}\_t^0, \mathfrak{y}\_t^0), (\mathfrak{x}\_t^1, \mathfrak{y}\_t^1), \dots, (\mathfrak{x}\_t^N, \mathfrak{y}\_t^N) \right\}. \tag{4}$$

The frame of reference of the dataset used in this paper is by default ego-centric as the trajectories were extracted from the onboard sensors. The longitudinal y-axis indicates the direction forward to the ego vehicle and the lateral x-axis direction indicates the axis perpendicular to it. The right-hand side is considered positive according to the dataset [41]. As a result, the model is independent of the curvature of the road which conveniently allows it to be used in the highway as long as an object-detection and a lane estimation algorithm is built on the target vehicle [3].

#### *3.2. Network Architecture*

The high-level architecture of the model we propose is shown in Figure 1. As discussed previously, it is structured as a sequence-to-sequence architecture capable of extracting socio-temporal correlation from past observed trajectories and then generating future trajectories based on that. Raw input *X* = [*p*1, *p*2, ... , *pT*] is first preprocessed into a suitable tensor. The encoder module then encodes the socio-temporal attention data into the input via two distinct multi-head-attention layers and then also encodes temporal memory via the LSTM layer. The decoder then decodes the encoded information into future predictions *Y*ˆ = [*q*ˆ*T*+1, *q*ˆ*T*+2, ... , *q*ˆ*T*+*F*]. The hidden features are passed onto successive decoding steps via further *MHA* layers to improve further future predictions which also prevents the traditional transformer decoding accumulation error by enriching the hidden features.

**Figure 1.** Proposed MALS-Net Architecture.

#### *3.3. Input Representation*

To input the trajectory data into our proposed model, the coordinates are first scaled to a range of (-1, 1) to aid the model to reach faster convergence [42]. The normalized coordinates are then preprocessed into a tensor *<sup>X</sup>* <sup>∈</sup> <sup>R</sup>*N*×*T*×*<sup>D</sup>* where D is the feature space, in the case of our input having a value of 2 representing the (*x*, *y*) coordinates of the vehicles.

This tensor is then run through two different embedding layers simultaneously. The first embedding layer *PE* (see Figure 1 above), is a multi-layer perceptron (MLP) which, as shown in Equation (5), maps raw input features to a higher dimension *DIE*. The second embedding layer *DE*, in Equation (6), maps the relative distance among each vehicle *p<sup>i</sup> <sup>t</sup>* − *p j t*, for *<sup>i</sup>*, *<sup>j</sup>* <sup>∈</sup> [1, *<sup>N</sup>*] and *<sup>t</sup>* <sup>∈</sup> [1, *<sup>T</sup>*], to a higher dimension *<sup>D</sup>DE*. The two feature representations are then concatenated to produce the final input into the model with a feature space size *Dmodel* = *DIE* + *DDE*. The final input thus consists of both the embedded position and relative distance between vehicles as features before being run through the model.

$$IE\_t^i = \text{MLP}\left(p\_{t\prime}^i \,\mathcal{W}\_{\text{IE}}\right) \tag{5}$$

$$DE\_t^i = \text{MI.P}\left(p\_t^i - p\_{t'}^j \,\!\!W\_{DE}\right) \tag{6}$$

#### *3.4. Social Multi-Head Attention*

The Social Multi-Head Attention (*SMHA*) layer, as shown in Figure 2, is proposed to extract the rich inter-vehicular social interaction information from the input. The features of the embedded input *Xemb* <sup>∈</sup> <sup>R</sup>*N*×*T*×*Dmodel* , through the last *IE* and *DE* layers, now contain rich embedded higher dimensional information about both the trajectories and relative distance of each vehicle. However, there is a multitude of features and extracting a viable correlation requires putting greater weight on features that actually contribute to a unit change in results. This work is done by an attention layer. This first multi-head attention layer enables the model to understand what features to put the most attention to. The embedded input is first transposed to (*Xemb*)*<sup>T</sup>* <sup>∈</sup> <sup>R</sup>*T*×*N*×*Dmodel* so that we can extract attention weights from the last two dimensions R*N*×*D*. The features of the input are then split into *nheads* transformer heads as <sup>R</sup>*nheads*×*T*×*N*<sup>×</sup> *Dmodel nheads* and fed into the Multi-Head Attention (*MHA*) layer. The *MHA* first computes the query (Q), key (K), and value (V) matrices at timestep *t*, as shown in Equation (7), via three MLPs which conserves the size of the feature space of the input as *Dmodel nheads* .

$$Q\_l = \text{MLP}\left(\left\{\mathbf{x}\_l^i\right\}\_{i=1}^N, w\_{\eta}\right), \quad K\_l = \text{MLP}\left(\left\{\mathbf{x}\_l^i\right\}\_{i=1}^N, w\_{\mathbf{k}}\right), \quad V\_l = \text{MLP}\left(\left\{\mathbf{x}\_l^i\right\}\_{i=1}^N, w\_{\upsilon}\right) \tag{7}$$

**Figure 2.** The *SMHA* block.

The weights *wq*, *wk* and *wv* are initialized as samples from the Xavier Normal Distribution [43], <sup>N</sup> (0, *<sup>σ</sup>*2) where

$$
\sigma = k \times \sqrt{\frac{2}{l\_{in} + l\_{out}}}.\tag{8}
$$

The value of *k* is the gain designed to be 1 and *lin* and *lout* are the input and output feature dimensions which in this case are both *Dmodel nheads* .

The inter-vehicular interaction is then represented by an undirected graph *Gt* = {*Vt*, *Et*} where the nodes *Vt* represent the observed vehicles *Vt* = {*v* | *i* = 1, 2, . . . , *N*} and the edges represent the interaction as binary values between vehicles *i* and *j*, *Et*[*i*][*j*] = 1 if *p<sup>i</sup> <sup>t</sup>* − *p j <sup>t</sup>* ≤ *dnear*, otherwise the value is zero. The interaction graph is used to mask social attention.

The masked attention of head *h* at timestep *t* is then computed as

$$\text{Attention}\_{l}(Q\_{t}, K\_{t}, V\_{t}) = \frac{\text{Softmax}\left(\left[\left(q\_{t}^{i}\right)^{T}k\_{t}^{j}\right]\_{mask}\right)}{\sqrt{d\_{k}}} \begin{bmatrix} v\_{t}^{i} \end{bmatrix}\_{mask} \tag{9}$$

where *mask* = {*j* | *Gt*[*i*, *j*] = 1, *j* ∈ [1, *N*]}. The masking based on the interaction graph *Gt* allows the attention scores to be calculated only when the vehicles are near the target, defined by the threshold *dnear*. In practicality, this is done to resemble a real driving scenario. A driver's decision is only based on cars that are observable and nearby. If it cannot observe a car and/or it is not nearby, it is not practical to calculate an interaction score between them as they do not share an interaction in that scenario. Therefore, we have to design the value of threshold *dnear* to resemble a region around the driver where the driving decision would be affected. The masked Multi-Head Attention on the social correlation can then be computed as

$$\text{MHA}(Q\_t, \mathbf{K}\_t, V\_t) = \text{Concat}(\text{Attention}\_1, \dots, \text{Attention}\_{n\_{\text{heads}}}).\tag{10}$$

The output from the Multi-Head Attention layer of size *OMHA* <sup>∈</sup> <sup>R</sup>*N*×*T*×*Dmodel* is then fed through an intra-layer MLP which helps boost training speed. This is then followed by an Add & Norm layer. The Add layer is simply a residual connection of the transposed input (*Xemb*)*<sup>T</sup>* <sup>∈</sup> <sup>R</sup>*T*×*N*×*Dmodel* which adds the current output with the input. This connection greatly helps models such as transformers reach convergence faster by resolving the vanishing gradient problem so it is also extensively used in our model [44]. The output of the Social Multi-Head Attention Layer is thus computed as follows,

$$O\_{SMHA} = \text{Norm}(\text{MLP}(O\_{MHA}) + (X\_{\text{cmb}})^T). \tag{11}$$

The output *OSMHA* has the same dimensions as the transposed Input (*Xemb*)*<sup>T</sup>* <sup>∈</sup> *T* × *N* × *Dmodel*.

#### *3.5. Positional Encoding*

To date, the *SMHA* layer has extracted the social correlation via the weights and biases of the layer. Thus, we now have to extract the temporal correlation from *OSMHA*. However, in order to model temporal dependencies, the order is very important. Even though the multi-head attention mechanism is powerful enough to process a long sequence of data very quickly due to its attention graph-based architecture, in doing so, it loses its sense of the order of each point in a sequence. The order was not particularly useful in the *SMHA* step as it is redundant information for social interaction between vehicles, but it is, however, imperative for the output *OSMHA* <sup>∈</sup> <sup>R</sup>*T*×*N*×*Dmodel* to preserve its sense of order before being pushed through the *TMHA* step, otherwise it will produce huge errors in the decoding step. Therefore, we propose a Positional Encoding layer (*PE*) to inject the sequential order into the data via sinusoidal positional encoding. First we transpose *OSMHA* back to *OSMHA* <sup>∈</sup> <sup>R</sup>*N*×*T*×*Dmodel* and then initialize the positional encoding matrix *<sup>P</sup>* <sup>∈</sup> <sup>R</sup>*T*×*Dmodel* . Each scalar *<sup>P</sup><sup>d</sup> <sup>t</sup>* | *t* ∈ [0, *T*], *d* ∈ [0, *Dmodel*] of the Positional Encoding Matrix is a sinusoidal function as follows

$$\begin{cases} \begin{aligned} PE^i(t, 2d) &= \sin\left(\frac{t}{10,000 \frac{2d}{D\_{\text{model}}}}\right), \\ PE^i(t, 2d) &= \cos\left(\frac{t}{10,000 \frac{2d}{D\_{\text{model}}}}\right). \end{aligned} \end{cases} \tag{12}$$

The above operation is then applied for *t* ∈ [1, *T*] and then the positional encoding matrix *<sup>P</sup>* <sup>∈</sup> <sup>R</sup>*T*×*Dmodel* is added to each vehicle *<sup>i</sup>* <sup>∈</sup> [1, *<sup>N</sup>*], to obtain *OPE* <sup>∈</sup> <sup>R</sup>*N*×*T*×*Dmodel* , which is the sequential order infused input to the *TMHA* layer.

#### *3.6. Temporal Multi-Head Attention*

In order to model and extract the temporal correlation of each vehicle from the input trajectory data, we propose another multi-head attention layer (*TMHA*), as illustrated in Figure 3, similar to the *SMHA*. The feature space of *OPE* is first split into *nheads* transformer heads as <sup>R</sup>*nheads*×*T*×*N*<sup>×</sup> *Dmodel nheads* and fed into a Multi-Head Attention Layer (*MHA*). The *MHA* first feeds the input into three MLPs respectively in order to compute the query (Q), key (k), and value (V) matrices for vehicle *i*, as shown in Equation (13), which preserves the feature space of the input as *Dmodel nheads* .

$$Q\_i = \text{MLP}\left(\left\{o\_t^{\hat{i}}\right\}\_{t=1}^T, w\_{\emptyset}\right), \quad K\_t = \text{MLP}\left(\left\{o\_t^{\hat{i}}\right\}\_{t=1}^T, w\_{\emptyset}\right), \quad V\_t = \text{MLP}\left(\left\{o\_t^{\hat{i}}\right\}\_{t=1}^T, w\_{\emptyset}\right) \tag{13}$$

The weights *wq*, *wk* and *wv* are again initialized as samples from the Xavier Normal Distribution [43] again, <sup>N</sup> (0, *<sup>σ</sup>*2), as discussed in Equation (8) above.

The masked attention of head *h* for the *i*-th vehicle is then computed as:

$$\text{Attention}\_{\boldsymbol{h}}(Q\_{\boldsymbol{i}}, \boldsymbol{K}\_{\boldsymbol{i}}, \boldsymbol{V}\_{\boldsymbol{i}}) = \frac{\text{Softmax}\left(\left[\left(\boldsymbol{q}\_{t}^{i}\right)^{T}\boldsymbol{k}\_{u}^{i}\right]\_{u=1}^{T-1}\right)}{\sqrt{d\_{k}}} \left[\boldsymbol{v}\_{t}^{i}\right]\_{u=1}^{T-1} \tag{14}$$

where ([(*q<sup>i</sup> t*)*Tki u*] *<sup>T</sup>*−<sup>1</sup> *<sup>u</sup>*=<sup>1</sup> ) demonstrates the time-masking of the timesteps. Essentially, the time mask prevents the current steps from accessing features from the relative future. For example, if the current timestep is *s* then the features from timestep *s* to *T* will be masked as zero. This prevents the model from overfitting and making exclusive correlations on the training data.

**Figure 3.** The *TMHA* block.

The Multi-Head Attention to the temporal dependency can then be calculated as

$$\text{MHA}(Q\_i, \mathcal{K}\_i, V\_i) = \text{Concat}(Attention\_1, \dots, Attack\_{\text{heads}}).\tag{15}$$

The output from the Multi-Head Attention layer of size *OMHA* <sup>∈</sup> <sup>R</sup>*N*×*T*×*Dmodel* is then fed through an intra-layer MLP, followed by an Add & Norm layer, which provides the residual connection of the output from the last layer *OPE* <sup>∈</sup> <sup>R</sup>*N*×*T*×*Dmodel* . After the residual connection and normalization, the output is fed through another multi-layer perceptron network (*FFN*). *FFN* is made up of two feed-forward networks that successively map the feature vectors to a higher dimension and then back to the original dimension such that output from MLP1 is R*N*×*T*×*DFFN* and MLP2 maps it back to R*N*×*T*×*Dmodel* . This is done to add more model parameters so that the temporal attention vectors can be fed through further layers such as an RNN. The output of the Temporal Multi-Head Attention Layer is thus computed as follows,

$$O\_{\rm TMHA} = \text{FFN}(\text{Norm}(\text{MLP}(O\_{\rm MHA}) + O\_{\rm PE}), W\_{\rm FFN}), \tag{16}$$

where

$$FFN = \text{MLP2}(\text{MLP1}(\text{FFN}\_{\text{in}}, w\_{\text{MLP1}}), w\_{\text{MLP2}}).\tag{17}$$

Thus the output from the *TMHA* layer now has both social and temporal correlation encoded into it. We refer to the *SMHA*, *PE* and *TMHA* layers compounded together as S*TMHA*. We then stack *M* number of S*TMHA* layers successively to extract more complex socio-temporal dependence from the past trajectory information.

#### *3.7. LSTM Encoder*

We propose a traditional LSTM encoder after the S*TMHA* layer stack to extract the hidden memory information from the output tensor *OSTMHA* <sup>∈</sup> <sup>R</sup>*N*×*T*×*Dmodel* . This is done to facilitate the hidden tensor passing to the LSTM decoder module. We propose this sequence-to-sequence architecture as a replacement for the traditional transformer architecture with an *MHA*-based transformer decoder. As mentioned before, due to the autoregressive nature of the transformer decoder, it carries the problem of accumulated errors. We thus use a sequence-to-sequence LSTM decoder to solve this problem. At first, the LSTM encoder layer recurrently computes the hidden-state tensor *Ht* <sup>∈</sup> <sup>R</sup>*L*×*N*×*Dhid* at time *t* for *t* ∈ [1, *T*] with *L* encoder layers and *Dhid* hidden dimensions as

$$
\sigma\_{t'}^{l} h\_t^l = \text{LSTM}(h\_{t-1'}^{l}, o\_{t-1'}^{l}, w\_{t-1}^{l}). \tag{18}
$$

After recurrently updating the hidden state for *<sup>T</sup>* timesteps, the *HT* <sup>∈</sup> <sup>R</sup>*L*×*N*×*Dhid* at the final timestep *T* is then passed on to the decoder module.

#### *3.8. S*TMHA*-LSTM Decoder*

To decode the hidden-state tensor *HT* <sup>∈</sup> <sup>R</sup>*L*×*N*×*Dhid* and at the same time extract the rich socio-temporal dependence information encoded into the hidden tensor from the decoded information, we propose an S*TMHA*-LSTM decoder, as demonstrated in Figure 4. This module is made up of an LSTM decoder architecture with a built-in multi-head attention mechanism to focus on the predicted behavior to improve further predictions at successive timesteps. The first layer of this architecture is another ordinary LSTM decoder which takes an input and a hidden tensor and decodes the output and updates the hidden tensor. The hidden input that is initialized for the decoder is the hidden tensor from the encoder, *HT*, that was recurrently updated for timesteps [1, *T*]. As input, the decoder takes in the raw input data at timestep *<sup>T</sup>*, *XT* <sup>∈</sup> <sup>R</sup>*N*×1×<sup>2</sup> with the last dimension denoting the coordinates of the position of the observed vehicles. It then decodes *HT*+<sup>1</sup> <sup>∈</sup> <sup>R</sup>*L*×*N*×*Dhid* for the next timestep *T* + 1, as follows,

$$
\dot{q}\_{T+1'}^{l} H\_{T+1}^{l} = \text{LSTM}(H\_{T'}^{l}, o\_{T'}^{l}; w\_{T}^{l}).\tag{19}
$$

**Figure 4.** The RNN encoder-decoder block.

The hidden tensor at timestep *T* + 1 is then fed into an S*TMHA* layer to extract and update the socio-temporal dependence at the next decoding step. This was repeated for every following decoding step. This is done to further improve the accuracy of successive decoding steps, as well as decode the future trajectories from the hidden information of the transformer encoder with lower auto-regressive accumulated errors. Teacher-forcing is also applied to improve convergence. Each decoding step either takes the predicted output as inputs such as *q*ˆ*T*+<sup>1</sup> or *XT*+<sup>1</sup> depending on the teacher forcing ratio. This enables the model to decode the highly accurate future coordinates of each vehicle at every timestep.

#### **4. Experiments**

In this section, we will report our chosen parameters, hardware, and results from the experiments performed on the publicly available dataset BLVD in order to evaluate the performance of the predictions by our proposed MALS-Net architecture. We will also conduct comparative experiments with other state-of-the-art models as well as ablative comparisons with our chosen architecture.

#### *4.1. Dataset*

The proposed model was trained, validated, and tested on the BLVD dataset. It consists of a total of 654 high-resolution videos resulting in 120,000 points of data. It was extracted from Changshu city, Jiangsu province. Each frame consists of the ID, 3D coordinates, vehicle direction, and interactive behavior information of all observed vehicles by the ego. Due to the data being collected by onboard sensors, there is less filter noise in the data compared to NGSIM, with more realistic sensor noise, therefore making it more practical. To divide the dataset into training, validation, and testing sets, we follow [41]. The dataset contains various scenario categories of the ego vehicle. We only choose the scenario involving highways with both a high and low density of participants. Other than that, the dataset is also split between day and night. We concatenate and shuffle all these sub-datasets before feeding them into our model.

#### *4.2. Implementation Details*

We extensively used a desktop running Windows 11 with 3.8 GHz AMD Ryzen 7 CPU, 32 GB RAM, and an NVIDIA 3070Ti Graphics Card to build our model. To train our model, we utilized the parallel High-Performance Computing Service which is a Hong Kong Polytechnic University resource. The high-performance computing nodes are made up of several industry-grade GPUs, the exact model of which is unknown to the authors.

#### *4.3. Hyperparameter Settings*

For the observable region, we set it to be a 30 m radial region, based on the assumption that a human driver would not be able to see beyond a 30 m region around them. For the interaction graph *Gt*, we set the threshold *dnear* to be 15 m. We set the number of S*TMHA* layers *M* = 2 and *nheads* of all the *MHA* modules to be 4 and the value of *Dmodel* in our model to be 128. The LSTM blocks all have a *Dhid* of 60. and a number of layers, *L* = 2. For the trajectory, we used the past timesteps *T* to be 3 s and future timestamps F to be 5 s. For model training, we used the Adam [45] optimizer with *η* = 0.001, *β* = 0.999. The learning rate used is 0.0001 and the batch size of 32. The teacher-forcing ratio used is 0.5.

#### *4.4. Evaluation Metrics*

In line with the existing literature [1–6,36,37,39,40] we adopted the root mean squared error (RMSE) between the prediction and the ground truth for ease of comparison of the model's performance with other state-of-the-art methods on the NGSIM dataset as well as for analyzing its performance with ground truth. RMSE at prediction time *t* , *t* ∈ [*T* + 1, . . . , *T* + *F*] can be calculated as follows,

$$\text{RMSE}\_{t} = \sqrt{\frac{1}{L} \sum\_{l=1}^{L} \left(\hat{\mathbf{Y}}\_{t}^{l} - \mathbf{Y}\_{t}^{l}\right)^{2}},\tag{20}$$

where *Y*ˆ is the predicted positions and *Y* is the Ground Truth Position of the *l*-th testing sample at timestep *t* and *L* is the total length of the test set.

#### *4.5. Ablative Analysis*

To defend the effectiveness of our architecture, we perform a variety of different ablative experiments in this section. At first to verify the performance of our encoder module in the first encoding social correlation, then the temporal correlation and then memory information into the input, we carry out three distinct experiments. These three experiments are described as follows.


The results in Table 1 compare the above three models with our proposed architecture MALS. The improvement score is based on the mean of the difference in RMSE over the 5 s of the prediction horizon. First, it can be observed that adding the *SMHA* layer improved the model performance by 39.6%. This shows that the significance of the *SMHA* layer in extracting the social context of the traffic scenario via the graph is crucial to the prediction and is thus a vital addition to our model. Additionally, the *TMHA* layer stands as even more important, with an improvement of almost 50% by adding the layer. This confirms that, in addition to the social context, more importantly, the prediction is mainly based on the historical trajectories and self-attention to the historical trajectories which our proposed model is proficient in extracting via the *TMHA* layer. Encoding the memory information also serves as important according to our ablative results. The LSTM encoder, encoding the hidden memory information improves the model performance by 23.5%. We can also infer that part of this improvement also comes from improving the accumulative errors caused by the transformer-based S*TMHA* encoder if no LSTM encoder is used to subsequently extract the hidden memory information. Figure 5 illustrates the RMSE values of the three experiments and their corresponding improvement.


**Figure 5.** Comparison of RMSE values from the Ablative Analysis on the Encoder

Secondly, to assess the effectiveness of our proposed decoder architecture, we conduct two distinct experiments as follows.


At first, we can observe the effects of the LSTM encoder-decoder architecture. Compared to an ordinary transformer-based encoder-decoder architecture without any RNN, we can see the errors grow almost exponentially in successive decoding steps. This is the seeming effect of the accumulative errors that originates from the autoregressive nature of the transformer decoding. In Table 2, it is clear that adding the LSTM encoding-decoding to allow hidden information passing to make predictions is superior to the transformerbased encoding-decoding, especially in future timesteps. Adding an LSTM-based encoderdecoder thus improves both the model RMSE as well as autoregressive accumulating errors. Secondly, it is also evident how the S*TMHA* layer for the LSTM decoder is also crucial in mitigating successive decoding errors. Adding this layer also showcases the increasing improvement of RMSE in successive decoding steps. This establishes that extracting the socio-temporal interdependence of the traffic in successive decoding steps is very important in further predicting trajectories further into the future. Figure 6 illustrates the RMSE values of these two experiments and their corresponding improvement.


**Table 2.** Ablative Analysis of the Decoder Module.

**Figure 6.** Comparison of RMSE values from the Ablative Analysis on the Decoder.

Our model also proposed a threshold to distinguish nearby vehicles from observed vehicles, which we call *dnear*. This threshold also limits the social influence range between vehicles, so properly designing this parameter is crucial to our proposed model. We thus also conducted further experiments to design the value of this threshold. We chose the value from a pool of four values 10, 20, 30, 40. As shown in Table 3, excessively small values of *dnear* result in poor performance seemingly due to the fact that it ignores the realistic interaction between vehicles beyond the threshold. It is also observed that values larger than 30 do not produce a notable improvement, so we chose the value of 30 to represent *dnear*.


**Table 3.** Ablative Analysis of the Nearby Distance Threshold.

*4.6. Comparative Analysis*

We use the following models in the literature to compare our model.


We demonstrate the performance of our model compared to the above models in Table 4. The capability of our model to use the strength of the transformer network and model socio-temporal interaction without dealing with autoregressive errors is demonstrated in the table. Our model seemingly outperforms all other baselines, with specifically significant performance in the fourth and fifth-second prediction horizons, establishing the strength of our model in long-term predictions.

**Table 4.** Comparative Analysis.


#### *4.7. Prediction Visualization*

RMSE is generally an effective indicator of performance but visualization is often needed to analyze the strengths and weaknesses of a model. Thus, in this section, we provide a visualization of some test cases to better understand our model's performance in depth, by comparing predicted trajectories with ground truth. Figure 7 demonstrates three distinct scenes each with different levels of our model performance, with the trajectories marked in different colors. The first scene demonstrates a typical congested highway driving scenario. The ego here is surrounded by five other vehicles, all contributing to the interaction-modeling of the ego. With enough social interaction context, our model seems to perform very well with minimal difference in the ground truth and predicted trajectories.

**Figure 7.** Prediction Visualization of three different scenes.

The second scene represents a fairly uncongested scene compared to the first. Our proposed model also seems to perform relatively well in this case with a negligible deviation of the predicted trajectory from the ground truth. However, the performance, in contrast to the first scene, is relatively lower. We suppose it is due to the relatively much lower interaction context. Because there are not enough cars, our proposed socio-temporal interaction modeling does not produce perfectly predicted trajectories with a negligible yet noticeable deviation between the predicted and the ground truth.

*9HKLFOHVL]HVKDYHEHHQH[DJJHUDWHGDQGWUDMHFWRULHVKDYHEHHQIUHHKDQGGUDZQIURPFRRUGLQDWHV*

The third case illustrates a scenario where there are multiple lane change cases. The performance of the model, in this case, is relatively poor due to the fact that the exact trajectory is ambiguous even though the model predicts a possible lane change. It also failed to capture the deviation of the path of the yellow vehicle due to the blue one taking its lane in front. We also believe there are not enough lane change cases in the highway dataset of BLVD which possibly also contributes to the poorer performance. We think creating a behavior prediction branch in the model to predict behaviors and then feeding the behaviors back into the model to improve the interaction prediction can improve the model's performance on lane change scenarios, due to the fact that predicting the exact time when an intention-change will occur can improve the lane change path prediction. We also think that pre-training the model on some datasets such as HighD [48] with more lane change scenarios may mitigate some issues of extreme cases.

#### **5. Conclusions**

In this article, we proposed a transformer-based LSTM encoder-decoder network to model the socio-temporal interaction and predict the future trajectories of surrounding vehicles. We used the multi-head mechanism of transformers to efficiently extract the social and temporal interaction individually and have encoded it into the input as hidden information via the LSTM encoder in the encoder module. We have then used the decoder module to decode the hidden information, using another multi-head attention layer on the decoder to improve the successive decoding accuracy. We trained and evaluated our model on a practical, ego-centered, large-scale dataset that is derived from onboard sensor data. We conducted extensive studies, including both ablative and comparative studies. Our experiments verified that our model outperforms all other previous models. Ablative studies confirmed that our model solves the accumulative error caused by the transformer's

autoregressive decoding behavior. The visualization results showed our model's strength in difficult congested scenarios, as well as its limitation in lane-change path predictions. The proposed model can be implemented in a practical driving scenario to predict the future trajectories of surrounding vehicles based on their historical tracks, provided that there is an object detection system such as YOLO is in place. In the future, we plan to enhance the model's performance on exact lane-change tracks by better modeling the driver-intention and feeding it into the interaction procedure. As another potential future path, this model can also be extended to involve more traffic participants such as cyclists and pedestrians and predict their behavior as well. Currently, this model can only be utilized in a freeway driving scenario. Adopting this model for urban driving, incorporating more complex information such as lane types, spatial HD maps, and traffic lights may be another future direction worth pursuing.

**Author Contributions:** Conceptualization, H.H.; methodology, F.H.; investigation, F.H.; resources, H.H.; writing—original draft preparation, F.H.; writing—review and editing, H.H.; visualization, F.H.; supervision, H.H.; project administration, H.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by The Hong Kong Polytechnic University College of Undergraduate Researchers & Innovators (PolyU CURI)'s Undergraduate Research & Innovation Scheme (URIS). The Project Fund number is 9-AAE-44. Any details can be found in the following link: https://www.polyu.edu.hk/gs/ug-research/curi/about-curi/ (accessed on 15 November 2022).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** This study was evaluated on the publicly available dataset BLVD https: //drive.google.com/file/d/1fbZsIHM8Yq8TE2avDzT8xuWIBfrcFgQC/view?usp=sharing, accessed on 15 August 2022.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

## *Article* **Criticality Assessment Method for Automated Driving Systems by Introducing Fictive Vehicles and Variable Criticality Thresholds**

**Demin Nalic 1, Tomislav Mihalj 2,\*, Faris Orucevic 2, Martin Schabauer 2, Cornelia Lex 2, Wolfgang Sinz <sup>3</sup> and Arno Eichberger <sup>2</sup>**


**Abstract:** The safety approval and assessment of automated driving systems (ADS) are becoming sophisticated and challenging tasks. Because the number of traffic scenarios is vast, it is essential to assess their criticality and extract the ones that present a safety risk. In this paper, we are proposing a novel method based on the time-to-react (TTR) measurement, which has advantages in considering avoidance possibilities. The method incorporates the concept of fictive vehicles and variable criticality thresholds (VCTs) to assess the overall scenario's criticality. By introducing variable thresholds, a criticality scale is defined and used for criticality calculation. Based on this scale, the presented method determines the criticality of the lanes adjacent to the ego vehicle. This is performed by placing fictive vehicles in the adjacent lanes, which represent copies of the ego. The effectiveness of the method is demonstrated in two highway scenarios, with and without trailing vehicles. Results show different criticality for the two scenarios. The overall criticality of the scenario with trailing vehicles is higher due to the decrease in avoidance possibilities for the ego vehicle.

**Keywords:** fictive vehicles; safety assessment; scenario criticality; automated driving

#### **1. Introduction**

The assessment of ADS is essential for the approval and testing procedures. Due to a high scenario space and the complexity of ADS that needs to be tested for safety approval, scenario-based methods are promising and widely used approaches in the field of research. Recent studies, for example, [1,2], are focused on simulation-based testing and the generation of scenarios to reduce the testing effort for ADS. The reduction is achieved by identifying relevant safety-critical scenarios (SCS) using a variety of safety metrics and scenario-generation methods. A comprehensive review of current studies related to scenario-based approaches is presented in [3]. A scenario in the context of this work represents a series of actions occurring over a period of time, as defined in [4]. Other similar definitions are introduced and can be found, for example, in [5,6]. After extracting scenarios from a synthesized or real dataset, the scenarios are analyzed and assessed on their criticality. There are different definitions of what the criticality of a certain situation means. According to [7], accident probability combined with severity defines the criticality of a certain situation. The probability which is used here takes into account the avoidance of a particular situation. This approach enables a better interpretation of a specific situation. To calculate accurate avoidance possibilities, environment parameters, which are composed of static and dynamic objects, should be considered for evaluating the criticality of the situation. The available probabilistic methods, such as those in [8– 10], calculate possible evasive maneuvers or accident probabilities; based on these, they

**Citation:** Nalic, D.; Mihalj, T.; Orucevic, F.; Schabauer, M.; Lex, C.; Sinz, W.; Eichberger, A. Criticality Assessment Method for Automated Driving Systems by Introducing Fictive Vehicles and Variable Criticality Thresholds. *Sensors* **2022**, *22*, 8780. https://doi.org/10.3390/ s22228780

Academic Editors: Yafei Wang, Chao Huang, Peng Hang, Zhiqiang Zuo and Bo Leng

Received: 29 September 2022 Accepted: 9 November 2022 Published: 14 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

estimate whether a situation is evasive, critical, or fulfills safety criteria. This consideration is advantageous on the one hand because it takes into account a large number of possibilities and the probability or feasibility of these possibilities in the calculation. On the other hand, due to complex solutions to mathematical problems, real-time capabilities and unique solutions are only sometimes given. To overcome such issues, we introduce a novel assessment method that introduces fictive ego vehicles on the left and right lanes adjacent to the tested ego vehicle. These fictive ego vehicles represent the exact copy of the ego vehicle, i.e., the vehicle configuration and the automated driving functions are the same. In this constellation, the ego vehicle and the neighboring fictive ego vehicles are assessed individually in the respective lane according to the introduced criticality scale. Combining the criticalities of the ego and fictive vehicle, an overall criticality of the situation can be calculated. To calculate the criticality, we take the avoidability of a collision into account using the TTR metric. The TTR metric consists of the time-to-brake (TTB) and time-to-steer (TTS) measurements, which represent the times left to avoid a critical situation by braking or steering, respectively. For SCS assessment, usually, the fixed metric thresholds are defined. However, metrics often depend on the kinematic relations between the ego and surrounding objects, such as distances and velocities. Although time-based metrics capture the influence of both distance and velocities, thresholds remain changeable depending on the object states. To overcome this issue with TTR, we are introducing variable criticality thresholds (VCTs) that determine a criticality scale. Finally, combining the concept of fictive ego vehicles, the TTR, and the VCTs, we derived the novel assessment method presented in this work. In the first Section 2, we present a detailed analysis of safety metrics used in scenario-based approaches for the assessment of SCS and a detailed description of the usage of the TTR. Using the TTR the criticality definition and the assessment method are presented in Section 3. As the last step of the work, we demonstrate the effectiveness of the presented method based on a safety-relevant test example.

#### **2. Safety Metrics**

According to [11], safety metrics are classified into temporal-, distance- and decelerationbased safety metrics. In [12], additional categorizations are introduced: the statistics-based, potential-field-based, and unexpected-driving-behavior-based metrics. These reviews show that temporal safety metrics are often and widely used to assess scenarios. The leading representative that is often mentioned in the context of the criticality of scenarios is the TTC. The main advantage of the TTC lies in its simplicity and efficiency for rear-end scenarios. The main disadvantages of the TTC criteria are the insensitivity to lateral collision risks and the fact that it cannot reflect the potential risk when the relative speed between two vehicles is zero. This implies that a possible intervention of a driver in safety-critical situations is not taken into account by the TTC. Therefore, various adjustments and adaptations of the classic TTC time were made; for example, a modified time-to-collision (MTTC) measure was introduced in [13]. Compared with the conventional TTC, the MTTC considers relative acceleration and relative velocity to improve collision prediction. In [14], the worst-timeto-collision (WTTC) metric was introduced. The WTTC considers the influence of vehicle dynamics and avoidance possibilities of the ego vehicle with the worst-case assumption regarding vehicle behavior. Another approach based on the TTC is presented in [15], where a time-exposed TTC (TET) and the time-integrated TTC (TIT) are introduced. These two metrics are extensions of the TTC and take the whole trajectory of vehicles over space and time into account. In [16], the TTR was introduced for the development and evaluation of forward-collision-avoidance systems, which consider a TTS, a TTS, and a TTK. In general, these reaction times determine the remaining time for successful collision avoidance by braking, steering, or acceleration. A similar approach is described in [17]; here, in addition to the TTS, a requested acceleration is introduced and used for the assessment of certain scenarios for ADS. The main advantages of these approaches compared with conventional temporal metrics such as the TTC and extensions of the TTC [18,19] are in consideration of avoidance possibilities and kinematic behavior of vehicles. The later the driver reacts

successfully to an impending collision, the more the system is forced to exploit the vehicle's physical limits. Using the TTR as a safety metric offers the possibility to interpret and define the criticality in a wide range of safety-relevant scenarios. The present work represents a continuation of previous studies of [16,17,20], where longitudinal and lateral kinematics are taken into account for a criticality assessment of ADS.

Time-to-react is defined as TTR = max{TTB, TTS}, where TTB and TTS are the timeto-brake and time-to-steer metrics, respectively. They represent time reserves to the last point to brake (LPTB) and last point to steer (LPTS) [21]. The points at which the ego has to perform braking or steering to avoid a collision with a vehicle in front, also called a target vehicle. For practical reasons, the times at which TTB and TTS are equal to 0 will be denoted as *τ*<sup>b</sup> and *τ*s, respectively. In their general and simplest forms, they are defined by (1) and (2). The remaining distances between the ego and the target vehicle at *τ*<sup>b</sup> and *τ*<sup>s</sup> are defined as *d*<sup>b</sup> and *d*<sup>s</sup> by (3) and (4), respectively. The *d*<sup>b</sup> is the distance that ego travels while reducing its velocity (*v*ego) to the target's velocity *v*<sup>t</sup> by applying the maximum braking *a*min,x [21]. It is worth mentioning that *a*min,x stands for the maximum longitudinal deceleration, wherein we respect the positive and negative signs of acceleration in the longitudinal direction. Here, *d*<sup>s</sup> is calculated as equivalent to the evasion time *t*ev (5) needed to change a lane with maximum lateral acceleration, *a*max,y [17,21]. Unlike the longitudinal direction, in the lateral direction, we do not consider negative signs; therefore, *a*max,y denotes the maximum acceleration, regardless of direction. These temporal and spatial relations are depicted in Figure 1, while all the parameters used to calculate the metrics and their thresholds are summarized in Table 1. Figure 1 illustrates that *τ*<sup>s</sup> occurs at a later point in time than *τ*b, which indicates the steering as the last emergency intervention to avoid a collision; however, this is not always the case. Figure 2 shows the dependency of the remaining distances *d*<sup>b</sup> and *d*<sup>s</sup> on the relative velocity between the ego vehicle and target vehicle. Here, braking leaves more time to avoid a collision at low velocities, but at higher relative velocities, steering would be more appropriate.

$$
\pi\_\mathbf{b} = \frac{d\_\mathbf{0} - d\_\mathbf{b}}{v\_{\rm ego} - v\_\mathbf{t}} \tag{1}
$$

$$\tau\_{\rm s} = \frac{d\_0 - d\_{\rm s}}{v\_{\rm cgo} - v\_{\rm t}} \tag{2}$$

$$d\_{\mathbf{b}} = -\frac{v\_{\text{rel}}^2}{2\,a\_{\text{min,x}}}\tag{3}$$

$$d\_{\rm s} = \sqrt{\frac{2 \, d\_{\rm y}}{a\_{\rm max,y}}} \, v\_{\rm rel} \tag{4}$$

$$t\_{\rm ev} = \sqrt{\frac{2 \, d\_{\rm y}}{a\_{\rm max,y}}} \tag{5}$$

**Figure 1.** Visualization of steering and braking possibilities.

**Table 1.** Parameters used for safety metrics and thresholds.


**Figure 2.** Remaining distance of an ego vehicle to a target vehicle for avoiding a rear-end collision in longitudinal traffic for a pure braking maneuver and a pure steering maneuver.

The previously defined relations in (1)–(4) consider only the constant velocities of the ego and target vehicles representing only a special case and resulting in a reduced precision in the presence of variable velocities. To increase the prediction precision while keeping it feasible regarding its complexity, a straight road and constant velocity *v*<sup>t</sup> is assumed, while the ego vehicle keeps constant longitudinal acceleration *a*ego,x up to the point where evasive braking or evasive steering actions must be executed. Furthermore, considering ideal conditions, the possible maximum braking and lateral acceleration are *a*min,x = −*μ g* and *a*max,y = *μ g*, respectively, where *μ* represents the maximum available tire road friction. Considering such steady states, the remaining distances *d*<sup>b</sup> and *d*<sup>s</sup> are given by (6) and (7):

$$d\_{\mathbf{b}} = -\frac{(v\_{\text{rel}} + a\_{\text{ego,x}}\tau\_{\mathbf{b}})^2}{2|a\_{\text{min,x}}|}\tag{6}$$

$$d\_{\sf s} = \sqrt{\frac{2 \, d\_{\sf y}}{a\_{\rm max,y}}} (v\_{\rm rel} + a\_{\rm cgo,x} \,\, \tau\_{\sf s}), \tag{7}$$

while *τ*<sup>b</sup> and *τ*<sup>s</sup> are determined by solving the following two quadratic equations.

$$\frac{1}{2} \left( a\_{\text{ego,x}} - \frac{a\_{\text{ego,x}}^2}{a\_{\text{min,x}}} \right) \tau\_b^2 + \left( v\_{\text{rel}} - \frac{v\_{\text{rel}} \, a\_{\text{ego,x}}}{a\_{\text{min,x}}} \right) \tau\_b - d\_0 - \frac{v\_{\text{rel}}^2}{2 \, a\_{\text{min,x}}} = 0 \tag{8}$$

$$\frac{1}{2}a\_{\text{6ga,x}}\tau\_{\text{s}}^{2} + \left(\upsilon\_{\text{rel}} + a\_{\text{6ga,x}}\sqrt{2\frac{d\_{\text{y}}}{a\_{\text{max,y}}}}\right)\tau\_{\text{s}} - d\_{0} + \upsilon\_{\text{rel}}\sqrt{2\frac{d\_{\text{y}}}{a\_{\text{max,y}}}} = 0\tag{9}$$

The metrics described by (6)–(9) are used below to establish the thresholds and assess criticality.

#### **3. Assessment Method**

#### *3.1. Concept of Fictive Vehicles*

Figure 3 shows the novel concept of fictive vehicles on vehicle configurations taken from PEGASUS [22,23], where nine vehicles from the immediate surrounding of the ego vehicle are considered as safety relevant for scenario development or analysis. Furthermore, the vehicles EGOr <sup>f</sup> and EGOl <sup>f</sup> represent fictive duplicates of the ego vehicle and are needed for the assessment method described below. However, there is one condition under which

a fictive vehicle can be placed in adjacent lanes—the spaces at the neighboring lanes next to the ego must be unobstructed. For better interpretability, the environment around the ego vehicle is divided into trailing traffic, leading traffic, and the part adjacent to the ego vehicle, which consists of fictive vehicles.

**Figure 3.** Graphical representation of the two-lane and three-lane scenarios.

#### *3.2. Definition of the Criticality*

The criticality definition consists of two parts. The first part deals with the definition of variable criticality threshold (VCT) based on the metrics introduced in Section 2. The second part deals with the quantification of the criticality based on the VCT.

#### 3.2.1. Definition of Variable Criticality Thresholds

To determine the VCT, we build upon the required accelerations *a*req,x and *a*req,y for braking and steering, respectively, as well as on minimum safety distances for braking (*d*b,min) and steering (*d*s,min). Distances are calculated based on the mathematical model of responsibility-sensitive safety (RSS) [24], that accounts for the worst-case scenario, including the response time of the ego and the maximum theoretical braking to the standstill of a target vehicle. However, we introduced slight modifications for calculating *d*b,min, which are shown in (10). Unlike the RSS, we do not consider the maximum longitudinal acceleration of the ego during the response time; therefore, we use the current ego's acceleration instead. Another adjustment is regarding the ego's deceleration after the response time, where we use *a*req,x, which denotes braking but is not limited only to small values.

$$d\_{\rm b,min} = v\_{\rm ego} \, t\_{\rho} + \frac{1}{2} a\_{\rm ego,x} \, t\_{\rho}^2 - \frac{(v\_{\rm ego} + t\_{\rho} \, a\_{\rm max,acc})^2}{2 \, a\_{\rm req,x}} + \frac{v\_{\rm t}^2}{2 \, a\_{\rm min,x}} \tag{10}$$

The safety distance for steering, *d*s,min, is determined in a similar manner (11). The only difference is that the braking distance of the ego is substituted with the required distance to perform a lane change.

$$d\_{\rm s,min} = \upsilon\_{\rm ego} \, t\_{\rm \rho} + \frac{1}{2} a\_{\rm ego, \chi} \, t\_{\rm \rho}^2 + \sqrt{2 \frac{d\_{\rm y}}{a\_{\rm req, \chi}}} \left( \upsilon\_{\rm ego} + a\_{\rm ego, \chi} \, t\_{\rm \rho} \right) + \frac{\upsilon\_{\rm t}^2}{2 \, a\_{\rm min, \chi}} \tag{11}$$

Substituting the initial distance *d*<sup>0</sup> in (8) and (9) with the minimum distances *d*b,min and *d*s,min, the threshold values for TTB (12) and TTS (13) can be determined. In this way, the safety distance is transferred into the time domain and steered by *a*req,x and *a*req,y. With *a*req,x and *a*req,y, it is possible to interpret the thresholds and link them to objective or subjective safety and comfort investigations. Now, at a certain *τ*b, when *d*<sup>0</sup> = *d*b,min, we can claim—if there is sudden maximum braking of the target vehicle—that the ego will need to brake with *a*req,x to avoid a collision; the same applies for steering.

$$\frac{1}{2} \left( a\_{\text{ego,x}} - \frac{a\_{\text{ego,x}}^2}{a\_{\text{min,x}}} \right) \tau\_{\text{b,th}}^2 + \left( v\_{\text{rel}} - \frac{v\_{\text{rel}} \, a\_{\text{ego,x}}}{a\_{\text{min,x}}} \right) \tau\_{\text{b,th}} - d\_{\text{b,min}} - \frac{v\_{\text{rel}}^2}{2 \, a\_{\text{min,x}}} = 0 \tag{12}$$

$$\frac{1}{2}a\_{\text{ego,x}}\tau\_{\text{s,th}}^2 + \left(v\_{\text{rel}} + a\_{\text{ego,x}}\sqrt{2\frac{d\_\text{y}}{a\_{\text{max,y}}}}\right)\tau\_{\text{s,th}} - d\_{\text{s,min}} + v\_{\text{rel}}\sqrt{2\frac{d\_\text{y}}{a\_{\text{max,y}}}} = 0 \tag{13}$$

In the present work, different safety thresholds were defined according to the different criticality values determined by the required accelerations *a*req,x and *a*req,y, which were obtained from the literature and are provided in Table 2. Driving comfort was taken as the criterion to determine the lowest deceleration threshold level. Several studies, such as [25,26], have correlated the subjective evaluation of driving comfort to objective values, and one of the most significant is acceleration. Taking the relevant literature into account, <sup>−</sup><sup>2</sup> m/s2 was considered as the lowest threshold for longitudinal acceleration. The other extreme is the highest threshold, which corresponds to emergency braking and depends on the current friction coefficient. In this study, we defined the emergency level by −*μ g*. Inbetween these limits and regarding the criticality levels defined below, we defined two intermediate levels based on the partial braking of adaptive cruise control (ACC) [27]. Therefore, the lower intermediate level is <sup>−</sup><sup>3</sup> m/s2 and the higher intermediate level is <sup>−</sup><sup>5</sup> m/s<sup>2</sup> . Furthermore, the lateral direction (*a*req,y) is determined based on the lane change duration. For a comfortable lane change, we have chosen 6 s, which is an acceptable value based on the driving simulator research in [28]. The intermediate values are an educated guess based on the analysis of the naturalistic driving data derived in [29]. According to [29], the lane change duration varies from 0.7 s to 16.1 s, with a mean of 3.81 s and a standard deviation of 2.03 s. For the first intermediate step, we have chosen 4 s because it fits with the studies conducted in [28,29]. The second intermediate step is 2 s, which is based on the standard deviation presented in [29]. The emergency maneuver is set as previously defined with *a*max,y = 7 m/s2 , which also fits the order size of the lower values in [29]. It should be highlighted that the values for lateral acceleration in Table 2 are explicitly derived from subjective studies on lane change and calculated using a simplified equation for the evasion time *t*ev (5), where the lateral evasion distance, *d*y, is the lane width 3.5 m. Therefore, different lateral accelerations can be expected if other studies or advanced equations for *t*ev are considered.


**Table 2.** Acceleration levels for lateral and longitudinal movement.

#### 3.2.2. Criticality Scale

Based on the VCT, the criticality is quantified with four criticality values, *ci*. This procedure is illustrated in Figure 4, where the index *i* ∈ {1, 2, 3, 4} represents the values for a certain criticality level. To calculate those values, the relations from (8), (9), (12) and (13), and the TTRs are needed. Using the longitudinal and lateral acceleration levels from Table 2 as the required deceleration for the safety distance *d*b,min and *d*s,min in (10) and (11), the distance relations are calculated for the longitudinal and lateral direction in (14).

Longitudinal direction

$$\begin{aligned} d\_{\mathrm{b,min}}^{\mathrm{L}} &= d\_{\mathrm{b,min}} (a\_{\mathrm{req.x}} = -2 \,\mathrm{m/s^2}) \\ d\_{\mathrm{b,min}}^{\mathrm{int,1}} &= d\_{\mathrm{b,min}} (a\_{\mathrm{req.x}} = -3 \,\mathrm{m/s^2}) \\ d\_{\mathrm{b,min}}^{\mathrm{int,2}} &= d\_{\mathrm{b,min}} (a\_{\mathrm{req.x}} = -5 \,\mathrm{m/s^2}) \\ d\_{\mathrm{b,min}}^{\mathrm{H}} &= d\_{\mathrm{b,min}} (a\_{\mathrm{req.x}} = -\mu \,\mathrm{g}) \end{aligned}$$

Lateral direction

$$\begin{aligned} d\_{\rm s,min}^{\rm L} &= d\_{\rm s,min} (a\_{\rm rep,y} = 0.2 \,\mathrm{m/s^2}) \\ d\_{\rm s,min}^{\rm int,1} &= d\_{\rm s,min} (a\_{\rm rep,y} = 0.5 \,\mathrm{m/s^2}) \\ d\_{\rm s,min}^{\rm int,2} &= d\_{\rm s,min} (a\_{\rm rep,y} = 1.9 \,\mathrm{m/s^2}) \\ d\_{\rm s,min}^{\rm L} &= d\_{\rm s,min} (a\_{\rm rep,y} = \mu \,\mathrm{g}) \end{aligned} \tag{14}$$

The minimum distances *d j* b,min and *d j* s,min with *j* ∈ {L, int,1 , int,2 ,H} from (14) represent the minimum distance between the ego and target vehicle at which evasive maneuver must be performed with corresponding *a*req,x or *a*req,y to avoid a collision. Applying these distances in the threshold relation (12) and (13), the VCTs are defined. The VCT levels are calculated according to (15).

TTB ≥ TTS

$$\begin{aligned} \tau\_{\rm th}^{\rm L} &= \tau\_{\rm b,th} (d\_{\rm b,min} = d\_{\rm b,min}^{\rm L})\\ \tau\_{\rm th}^{\rm int,1} &= \tau\_{\rm b,th} (d\_{\rm b,min} = d\_{\rm b,min}^{\rm int,1})\\ \tau\_{\rm th}^{\rm int,2} &= \tau\_{\rm b,th} (d\_{\rm b,min} = d\_{\rm b,min}^{\rm int,2})\\ \tau\_{\rm th}^{\rm H} &= \tau\_{\rm b,th} (d\_{\rm b,min} = d\_{\rm b,min}^{\rm H}) \end{aligned}$$

TTB < TTS

$$
\begin{aligned}
\tau\_{\rm th}^{\rm L} &= \tau\_{\rm s,th} \left(d\_{\rm s,min} = d\_{\rm s,min}^{\rm L}\right) \\
\tau\_{\rm th}^{\rm int,1} &= \tau\_{\rm s,th} \left(d\_{\rm s,min} = d\_{\rm s,min}^{\rm int,1}\right) \\
\tau\_{\rm th}^{\rm int,2} &= \tau\_{\rm s,th} \left(d\_{\rm s,min} = d\_{\rm s,min}^{\rm int,2}\right) \\
\tau\_{\rm th}^{\rm H} &= \tau\_{\rm s,th} \left(d\_{\rm s,min} = d\_{\rm s,min}^{\rm H}\right)
\end{aligned}
\tag{15}
$$

These threshold values represent four intervals for the TTR. Depending on the range in which the TTR time is, a criticality value according to (16) is determined. The less time available for an evasive maneuver, the more critical the situation is. It is important to notice that threshold intervals are changeable over time and depend on the vehicle states, such as *v*ego, *a*ego and *v*t.

$$c\_{i} = \begin{cases} c\_{1} = 1, & \text{TTR} \ge \tau\_{\text{th}}^{\text{L}} \\ c\_{2} = 2, & \tau\_{\text{th}}^{\text{L}} > & \text{TTR} \ge \tau\_{\text{th}}^{\text{int,1}} \\ c\_{3} = 3, & \tau\_{\text{th}}^{\text{int,1}} > & \text{TTR} \ge \tau\_{\text{th}}^{\text{int,2}} \\ c\_{4} = 4, & \tau\_{\text{th}}^{\text{int,2}} > & \text{TTR} \ge \tau\_{\text{th}}^{\text{H}} \end{cases} \tag{16}$$

#### *3.3. Criticality Assessment Method*

To assess the overall criticality of the scenario we are combining the criticality definition from (16) with the concept of the fictive vehicles depicted in Figure 3. Considering the three-lane configuration from Figure 3, it is possible to place two fictive vehicles as the condition of unoccupied adjacent lanes is satisfied. For each of those fictive vehicles, as well as for ego, the criticality values according to (16) are determined. However, for the fictive vehicle, only braking as an evasive maneuver is considered which leads to TTR=TTB, while for the ego, TTR = max{TTB, TTS}. Now, the overall criticality *c*s is determined by (17). This equation takes mean criticality values calculated between ego (*c*ego) and each fictive vehicle (*c*r ego,f and *<sup>c</sup>*<sup>l</sup> ego,f). The mean values are rounded to the next higher value, and the lower is chosen as the overall criticality. In the case of equal mean values for the left and right lanes, the one with a higher TTR is taken.

$$c\_s = \min\left\{ \left\lceil \frac{c\_{\text{ego}} + c\_{\text{ego},f}^r}{2} \right\rceil, \left\lceil \frac{c\_{\text{ego}} + c\_{\text{ego},f}^l}{2} \right\rceil \right\} \tag{17}$$

Additionally, if a trailing vehicle exists, the time gaps *τ*gap,l and/or *τ*gap,r are calculated and considered for the criticality assessment. The time gap *τ*gap,l represents a time gap between the trailing vehicle at the left side and *τ*gap,r at the right side. The time gap itself represents the time interval between the fictive ego vehicles EGO<sup>l</sup> <sup>f</sup> and EGOr <sup>f</sup> and the trailing vehicles Tl and Tr. The calculation of the time gaps for the left and right sight are presented in

$$\tau\_{\text{gap},\text{l}} = \frac{(s\_{\text{T}\_{\text{l}}} - s\_{\text{agg}\_{\text{l}}^{\text{l}}})}{v\_{\text{T}\_{\text{l}}}} \,. \tag{18}$$

$$\tau\_{\text{\ $p\$ ,r}} = \frac{(s\_{\text{T}\_{\text{r}}} - s\_{\text{\ $p\$ }\_{\text{f}}^{\text{of}}})}{\upsilon\_{\text{T}\_{\text{r}}}} \tag{19}$$

with *s*Tl and *s*Tr being the trailing vehicle positions; *v*Tl and *v*Tr being the trailing vehicle velocities; *s*egol f and *s*egor <sup>f</sup> being the fictive ego vehicle positions. Here, *<sup>τ</sup>*th gap,r and *τ*th gap,l represent the threshold values for the time gap. If the time gape values are lower than these thresholds, the possibility of steering will not be considered for the overall assessment. In this case, TTR is set to be TTB.

#### **4. Simulation Results**

For demonstrating the introduced assessment method, two exemplary test scenarios shown in Figure 5 are taken into account. In the first scenario, the ego vehicle is approaching the target vehicle T2,2 in lane 2, while vehicles T1,2 and T3,2 are placed adjacent to T2,2. In the second scenario, the trailing T1,1 is added. Threshold for the time gap (*τ*th gap,l) between T1,1 and EGOl <sup>f</sup> is set to 3 s. This means that, if *<sup>τ</sup>*th gap,l ≤ 3 , then the ego cannot consider steering in lane 1. Scenarios are created and simulated within the simulation software CarMaker from IPG Automotive, and the initial positions and velocities of vehicles are given in Table 3.

**Figure 5.** Graphical representation of test scenarios.

**Table 3.** Initial states and parameters for the test scenarios.


Figure 6 shows the traveled distance and velocities of vehicles; while the ego vehicle keeps its speed constant, the target vehicles reduce their speed after 10 s to the defined velocities. Finally, scenarios will lead to the collision of the ego with T2,2, as the ego vehicle does not brake and has no assistance functions. Collision is not mitigated on purpose, in order to demonstrate how the criticality of the scenario changes as the ego approaches T2,2.

**Figure 6.** Distances traveled and velocities in the tested two scenarios.

Velocity reductions are different for each lane which cause different criticality values, and this behavior is shown in Figure 7. It can be seen that the criticality of the ego lane has the longest criticality phase with a criticality value of 4 (high criticality). This is because of the aggressive speed-reduction maneuver of the front vehicle, T2,2. The right lane has a slightly lower criticality, and the left one has the lowest criticality over time. Because the left lane has a criticality of 2 (low criticality) until 7.72 s, it represents a potentially avoidable situation for the ego vehicle in this time interval. Therefore, the overall criticality *c*s for the ego vehicle is one level lower between 5.92 s and 7.62 s, as shown in Figure 7. This is only valid up to 7.62 s, where the ego vehicle comes closer to the vehicle T2,2 and the potential evasive maneuver space becomes smaller. The last point where the ego vehicle could make evasive maneuver is at the point where the TTR time intersects the *τ*<sup>H</sup> th at 11.6 s. After that, a collision cannot be avoided anymore.

**Figure 7.** Criticality assessment results for the two introduced scenarios.

The whole scenario behaves slightly differently when a trailing vehicle is considered. The small time gap between the T1,1 and EGOl <sup>f</sup> eliminates the possibility of ego to steer in the left lane so the total criticality rises to the maximum criticality for the time interval 5–8 s, see Figure 8.

In summary, these two examples demonstrate the functionality of the criticality assessment method for the assumptions made to calculate the TTR and the chosen VCT. In addition, these particular examples were chosen to show the potential of the method by considering and analyzing the influences of the ego vehicles surrounding for the criticality calculation.

**Figure 8.** Criticality assessment results for the time frame between 5 s and 8 s.

#### **5. Conclusions**

In the current paper, we introduced a criticality assessment method for ADS. The novelty of the method lies in the introduction of fictive ego vehicles from the surrounding of the ego. In order to determine the overall criticality, the collision time reserve as TTR metric was introduced and evaluated using four criticality levels. The criticality levels are determined using variable threshold values, which are converted into the time domain from minimal safety distance metrics and acceleration values. Considered acceleration values present a linkage between objective investigation and threshold values, while the derived VCT fills the research gap by taking into account the kinematic relations between vehicles, and thus variable nature of the criticality. To derive feasible and intelligible method, certain assumption has been made. The method is derived for straight highway roads, while formulation assumes constant velocity of a target vehicle.

Finally, the method is demonstrated on two highway scenarios that defer in the presence of a trailing vehicle. The results show higher criticality of scenarios in which trailing vehicles obstruct the adjacent lane of the ego. This narrowed down the alternative options for the ego and increased the overall criticality. The potential and advantage of the proposed criticality assessment method reside in simplified consideration of the surrounding ego vehicles and the evaluation of possible evasive actions; in addition to the criticality assessment, it could be used for decision-making applications. For future works, the proposed criticality metric and threshold calculation will be applied in a wide range of scenarios, while the mathematical formulation will be extend to cover considered assumptions and to make realistic lane change models for steering maneuvers.

**Author Contributions:** Conceptualization, D.N., T.M. and M.S.; methodology, D.N. and T.M.; software, D.N. and F.O.; validation, D.N., F.O., T.M. and M.S.; formal analysis, F.O., A.E., C.L. and W.S.; investigation, D.N. and T.M.; resources, D.N. and T.M.; data curation, D.N. and F.O.; writing—original draft preparation, D.N., T.M., F.O., M.S., C.L. and W.S.; writing—review and editing, D.N., T.M., F.O., M.S., C.L., A.E. and W.S.; visualization, D.N.; supervision, A.E.; project administration, A.E.; funding acquisition, A.E. and W.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** The project was part of the Austrian Research Promotion Agency FFG Program "EFREtop": 885754.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** Supported by TU Graz Open Access Publishing Fund.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**


#### **References**


## *Article* **Nonlinear Predictive Motion Control for Autonomous Mobile Robots Considering Active Fault-Tolerant Control and Regenerative Braking**

**Peng Hang, Baichuan Lou and Chen Lv \***

School of Mechanical and Aerospace Engineering, Nanyang Technological University, Singapore 639798, Singapore; peng.hang@ntu.edu.sg (P.H.); baichuan.lou@ntu.edu.sg (B.L.) **\*** Correspondence: lyuchen@ntu.edu.sg

**Abstract:** To further advance the performance and safety of autonomous mobile robots (AMRs), an integrated chassis control framework is proposed. In the longitudinal motion control module, a velocity-tracking controller was designed with the integrated feedforward and feedback control algorithm. Besides, the nonlinear model predictive control (NMPC) method was applied to the four-wheel steering (4WS) path-tracking controller design. To deal with the failure of key actuators, an active fault-tolerant control (AFTC) algorithm was designed by reallocating the driving or braking torques of the remaining normal actuators, and the weighted least squares (WLS) method was used for torque reallocation. The simulation results show that AMRs can advance driving stability and braking safety in the braking failure condition with the utilization of AFTC and recapture the braking energy during decelerations.

**Keywords:** autonomous mobile robot; motion control; nonlinear model predictive control; active fault-tolerant control; regenerative braking

#### **1. Introduction**

Compared with the traditional automated guided vehicles (AGVs), autonomous mobile robots (AMRs) have higher flexibility and intelligence, representing a more sophisticated, flexible, and cost-effective technology, in favor of smart manufacturing, smart factory, and intelligent logistics [1]. AMRs are usually equipped with multiple actuators for steering, drive and brake. Therefore, AMR is an over-actuated system since each wheel can provide independent traction force [2]. It is a critical issue to realize the coordinated control between multiple actuators [3,4].

In recent years, different kinds of advance control methods have been applied to the motion control of robots, including optimal control [5], model predictive control (MPC) [6], Reinforcement Learning (RL)-based control approach [7], adaptive neural network [8], and neuroadaptive learning algorithms [9]. The chassis control of AMR usually consists of longitudinal motion control and lateral motion control [10]. Longitudinal motion control is associated with the drive and brake actuators, e.g., in-wheel motors (IWMs) and electromechanical brake (EMB) systems. In longitudinal motion control of AMRs, velocity-tracking control is in favor of the autonomous driving [11]. In [12], a parameter-varying controller was designed for velocity tracking, which showed high robustness. In [13], the MPC method was used for velocity-tracking controller design, which could recover the braking energy with brake torque allocation. In [14], an adaptive sliding mode control (ASMC) algorithm using Radial Basis Function (RBF) neural network was applied to the velocitytracking controller design, which could deal with external disturbances. Besides, Antilock Braking System (ABS), Acceleration Slip Regulation (ASR), and traction control have also been widely studied in longitudinal motion control for AMRs [15–17]. In the lateral motion control of AMRs, path-tracking is the main task for autonomous driving [18]. In [19], a

**Citation:** Hang, P.; Lou, B.; Lv, C. Nonlinear Predictive Motion Control for Autonomous Mobile Robots Considering Active Fault-Tolerant Control and Regenerative Braking. *Sensors* **2022**, *22*, 3939. https:// doi.org/10.3390/s22103939

Academic Editor: Steven L. Waslander

Received: 4 May 2022 Accepted: 21 May 2022 Published: 23 May 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

linear quadratic regulator (LQR) technique was used for the four-wheel steering (4WS) path-tracking controller design. However, it showed poor robustness in dealing with uncertainties and disturbances. To reduce the effect of uncertainties in vehicle parameters, a robust path-tracking controller was designed with a μ-synthesis approach [20]. The MPC approach has been widely used in the path-tracking control of AMRs [21]. In [22], an adaptive path-tracking strategy was proposed based on MPC and fuzzy rules, which could guarantee vehicle stability under high-speed and large-curvature conditions. In [23], a Tube-based MPC method was applied to the path-tracking controller design, which showed strong robustness to address uncertainties and disturbances. In [24], an iterative learning control (ILC) method was used for the path-tracking control of AMR, which could improve the path-tracking performance significantly.

To deal with the failure of actuators, a fault-tolerant control has been widely studied [25–27]. In [28], a synthesis method was applied to the reconfigurable fault-tolerant control system, which could deal with the failure of steering actuators. With the driving force allocation control method, the vehicle can reconstruct the distribution control strategy on-line under fault conditions, realizing active fault tolerance [29]. In [30], the linear-quadratic control method and the control Lyapunov function technique were used to design the hybrid fault-tolerant control algorithm for the four-wheel-driving vehicle, which can address the actuator failure in the path-tracking process. In [31], a robust fault-tolerant control scheme was designed for distributed actuated electric vehicles, which integrated cooperative game and terminal sliding mode control (SMC) into the framework of the feedback linearization method (FLM). In [32], a fault tolerant sliding mode predictive control (SMPC) strategy was proposed to address the actuator failure, in which SMC was used to improve the robustness of the MPC in the presence of modeling uncertainties and disturbances. In [33], a novel quantized SMC strategy based on switching mechanism was proposed to compensate for actuator failure effects. In [34], the minimax MPC in the delta-domain was deployed to achieve the tracking performance under the actuator fault, system uncertainties, and disturbance.

Most studies only consider the failure of one actuator, which cannot cover all failure conditions. In this research, all kinds of failure conditions of IWMs were studied. Besides, few studies consider the regenerative braking and actuator failure in the motion control process of AMR at the same time. The contributions of this research are summarized as follow: (1) To deal with the system nonlinearity and external disturbances, an integrated feedforward and feedback control algorithm was designed for longitudinal motion control of AMR; (2) To realize the collaborative steering of 4WS, the nonlinear model predictive control (NMPC) method was applied to the path-tracking controller design; (3) To address the braking failure of actuators, an active fault-tolerant control (AFTC) algorithm was designed for AMR by redistributing the braking torques of the rest normal actuators.

The rest of this paper is organized as follows. Section 2 gives the problem description and control framework for AMR. The modelling work for control algorithm design is described in Section 3. Section 4 presents the control algorithm design for AMR. Then, the simulation tests are described in Section 5. Finally, Section 6 provide some conclusions and suggests future work.

#### **2. Problem Description and Control Framework**

*2.1. Control Problem Description for AMR*

To realize autonomous driving, the motion control for AMR mainly consists of longitudinal motion control and lateral motion control. Lateral motion control is reflected by the path-tracking issue. Longitudinal motion control is related to the drive and brake control, which is a critical issue in this study.

IWMs are the key components for AMR. On one hand, in-wheel motors can be used to drive the AMR. On the other hand, regenerative braking can be realized with in-wheel motors, recovering the braking energy. AMR is usually equipped with four in-wheel motors for independent drive, and four EMB systems for independent braking. Due to so many actuators, the reliability of the system is decreased. Therefore, safety is a critical issue for AMR. In the braking process, if braking failure of actuators occurs, this reduces safety. To maximize regenerative braking energy, IWMs have higher braking priority than EMBs. EMBs are usually used to compensate the rest braking force. Therefore, we mainly discuss the braking failure of IWMs in this paper.

Figure 1 shows the braking failure conditions of IWMs divided into five types, i.e., failure of one IWM, failure of two IWMs on two sides, failure of two IWMs on the same side, failure of three IWMs, and failure of four IWMs. In this paper, the AFTC algorithm is proposed to deal with all kinds of braking failure of IWMs.

**Figure 1.** Braking failure of in-wheel motors: (**a**) Failure of one in-wheel motor; (**b**) failure of two in-wheel motors on two sides; (**c**) failure of two in-wheel motors on the same side; (**d**) failure of three in-wheel motors; (**e**) failure of four in-wheel motors.

#### *2.2. Chassis Control Framework for AMR*

The chassis control framework for AMR is illustrated in Figure 2, which mainly consists of longitudinal motion control and lateral motion control, i.e., the velocity-tracking control and the path-tracking control. In the path-tracking control module, NMPC is applied to the controller design. Based on the target path and the feedbacked vehicle state, the path-tracking controller outputs the front and rear wheel steering angels. In the velocity-tracking control module, an integrated feedforward and feedback controller is designed. To deal with the braking failure of IWMs, an AFTC module is designed after the velocity-tracking controller. With the torque redistribution of IWMs and EMBs, the AFTC algorithm is able to maximize the regenerative braking energy and guarantee safety at the same time.

**Figure 2.** Chassis control framework for AMR.

#### **3. Modelling**

#### *3.1. Vehicle Dynamic Model*

Some assumptions are made in this paper. First, only seven degrees of freedom are considered for the vehicle dynamic model, i.e., longitudinal motion, lateral motion, yaw motion of the vehicle and the four wheels' motion. Pitch motion, roll motion, and vertical motion of AMR are ignored. Drive anti-skid control is not considered in the longitudinal motion control strategy. This paper mainly focuses on the velocity-tracking control and braking control. Additionally, the longitudinal acceleration of the wheel center is considered equal to the longitudinal acceleration of the AMR at CG.

The longitudinal dynamic model is derived as follows [35].

$$m\left(\dot{\upsilon}\_{\text{x}} - \upsilon\_{\text{y}}r\right) = F\_{\text{x}} - F\_{\text{w}} - F\_{f} \tag{1}$$

$$F\_{\mathbf{x}} = F\_{xfl}\cos\delta\_{fl} + F\_{\mathbf{x}fr}\cos\delta\_{fr} + F\_{\mathbf{x}rl}\cos\delta\_{rl} + F\_{\mathbf{x}rr}\cos\delta\_{rr} \tag{2}$$

$$F\_w = \mathbb{C}\_D A \rho v\_x^2 / 2 \tag{3}$$

$$F\_f = f\_r mg \tag{4}$$

where *vx* and *vy* denote the longitudinal and lateral velocities, *r* denotes the yaw rate at the center of gravity (CG), *Fx* denotes the total longitudinal tire force acting on the vehicle. *Fw* and *Ff* denote the wind resistance and the rolling resistance, respectively. *m* denotes the vehicle mass, *δ<sup>i</sup>* (*i* = *fl*, *fr*, *rl*, *rr*) denotes the steering angle of each wheel (*fl* denotes the front left wheel, *fr* denotes the front right wheel, *rl* denotes the rear left wheel, and *rr* denotes the rear right wheel). *Fxi* (*i* = *fl*, *fr*, *rl*, *rr*) denotes the longitudinal force of each tire, *CD*, *A* and *ρ* denote the air resistance coefficient, windward area and air density, respectively., and *fr* and *g* denote the rolling resistance coefficient and the gravitational acceleration.

The lateral dynamic model is expressed by [36]

$$m\left(\dot{\upsilon}\_{\mathcal{Y}} + \upsilon\_{\mathcal{X}}r\right) = F\_{\mathcal{Y}} \tag{5}$$

$$F\_y = F\_{yfl}\cos\delta\_{fl} + F\_{yfr}\cos\delta\_{fr} + F\_{yrl}\cos\delta\_{rl} + F\_{yrr}\cos\delta\_{rr} \tag{6}$$

where *Fy* denotes the total lateral tire force acting on the vehicle. *Fyi* (*i* = *fl*, *fr*, *rl*, *rr*) denotes the lateral force of each tire, which is expressed with the Dugoff tire model [37].

The yaw dynamic model is written according to [38]

$$I\_z \dot{r} = M\_z \tag{7}$$

$$M\_{\overline{z}} = (F\_{yf}l\cos\delta\_{f\overline{l}} + F\_{yfr}\cos\delta\_{fr})l\_f - (F\_{yrl}\cos\delta\_{rl} + F\_{yrr}\cos\delta\_{rr})l\_r + \Delta M\_{\overline{z}}\tag{8}$$

where *Mz* denotes the total yaw moment acting on the vehicle, *Iz* denotes the yaw inertia moment, *lf* denotes the distance from the front axle to CG, and *lr* denotes the distance from the rear axle to CG. Δ*Mz* is the external yaw moment, which is created by the torque difference between left and right wheels.

$$
\Delta M\_{\overline{z}} = \left[ -F\_{xfl} \cos \delta\_{fl} + F\_{xfr} \cos \delta\_{fr} - F\_{xrl} \cos \delta\_{rl} + F\_{\overline{xrr}} \cos \delta\_{rr} \right] \frac{B}{2} \tag{9}
$$

where *B* denotes the vehicle track.

Additionally, the dynamic model of each wheel is derived by

$$I\_{\rm uv}\dot{\omega}\_{\rm i} = T\_{\rm i} - F\_{\rm xi}R\_{\rm w} \tag{10}$$

where *Ti* denotes the wheel torque, *Ti* = *Tdi* − *Tbi*, *Tdi* and *Tbi* denote the drive and brake torques, respectively, *ω<sup>i</sup>* and *Rw* denote the angular velocity of each wheel and the rolling radius of the tire, respectively, and *Iw* denotes the wheel moment of inertia.

#### *3.2. Path-Tracking Model*

As Figure 3 shows, the 4-wheel vehicle model is usually simplified to be a single-track model to simplify the controller design [39]. The steering angle transformation relationship between the two models follows the Ackerman steering geometry [40].

$$\begin{cases} \tan \delta\_{fl} = \frac{\tan \delta\_f}{1 - \frac{R}{2l} \left( \tan \delta\_f - \tan \delta\_r \right)}, & \tan \delta\_{fr} = \frac{\tan \delta\_f}{1 + \frac{R}{2l} \left( \tan \delta\_f - \tan \delta\_r \right)} \\ \tan \delta\_{rl} = \frac{\tan \delta\_r}{1 - \frac{R}{2l} \left( \tan \delta\_f - \tan \delta\_r \right)}, & \tan \delta\_{rr} = \frac{\tan \delta\_r}{1 + \frac{R}{2l} \left( \tan \delta\_f - \tan \delta\_r \right)} \end{cases} \tag{11}$$

where *δ<sup>f</sup>* and *δ<sup>r</sup>* denote the front and rear steering angles, and *l* denotes the distance from the front axle to the rear axle.

**Figure 3.** Single-track vehicle model.

The yaw angel *ϕ* and lateral position *Y* of AMR at CG are expressed as

$$\begin{cases} \dot{\varphi} = r\\ \dot{Y} = v\_x \sin \varphi + v\_y \cos \varphi \end{cases} \tag{12}$$

A combination of (5), (7), (11) and (12) yields the following path-tracking model for AMR.

$$\begin{array}{l}\dot{\mathbf{x}}(t) = f(\mathbf{x}(t), \boldsymbol{\mu}(t)) \\ y(t) = g(\mathbf{x}(t), \boldsymbol{\mu}(t)) \end{array} \tag{13}$$

$$f(\mathbf{x}(t), \boldsymbol{\mu}(t)) = \begin{bmatrix} -v\_{\mathbf{x}}r + \frac{\sum F\_{\mathbf{y}}}{m} \\ \frac{\sum M\_{\mathbf{r}}}{I\_{\mathbf{z}}} \\ r \\ v\_{\mathbf{x}}\sin\varphi + v\_{\mathbf{y}}\cos\varphi \end{bmatrix} \tag{14}$$

$$\mathbf{g}(\mathbf{x}(t),\boldsymbol{\mu}(t)) = \begin{bmatrix} 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \mathbf{x}(t) \tag{15}$$

where the state vector *x* = *vy*, *r*, *ϕ*, *Y T* , the output vector *y* = [ *ϕ*, *Y*] *<sup>T</sup>*, and the control vector *u* = [*δ<sup>f</sup>* , *δr*] *T*.

#### **4. Control Algorithm Design**

*4.1. Velocity-Tracking Control Algorithm*

For velocity-tracking controller design, (1) is rewritten as follows.

$$m\dot{v}\_x = F\_x - F\_w - F\_f + F\_c \tag{16}$$

where *Fc* = *mvyr*.

After Taylor expansion of (2) regarding cos *δ<sup>i</sup>* :

$$F\_{\mathbf{x}} = F\_{\mathbf{x}f\mathbf{l}} + F\_{\mathbf{x}f\mathbf{r}} + F\_{\mathbf{x}rl} + F\_{\mathbf{x}rr} + F\_d \tag{17}$$

where higher-order terms are placed in *Fd*.

Then, the simplified longitudinal dynamic model can be expressed as

$$m\dot{v}\_x = F\_{xfl} + F\_{xfr} + F\_{xrl} + F\_{xrr} - F\_w - F\_f + F\_c + F\_d \tag{18}$$

Based on the wheel dynamic model (10), it can be derived that

$$F\_{xi} = \frac{T\_i - I\_{wi}\dot{v}\_{xi}/R\_w}{R\_w} \quad (i = f l\_\prime f r, r l\_\prime r r) \tag{19}$$

Substitution of (19) into (18) yields

$$\left(m + \frac{\sum I\_{\rm ui}}{R\_{\rm uv}^2}\right)\dot{v}\_x = \frac{\sum T\_i}{R\_{\rm uv}} - F\_{\rm uv} - F\_f + F\_c + F\_d \tag{20}$$

The total torque of four wheels ∑ *Ti* is defined as the longitudinal control vector, which is made up of the feedforward and feedback controllers, i.e.,

$$
\sum T\_i = \mu\_{ff} + \mu\_{fb} \tag{21}
$$

According to the model (20), the feedforward controller is derived as follows.

$$
\mu\_{ff} = \left(mR\_w + \frac{\sum I\_{wi}}{R\_w}\right) \dot{\upsilon}\_x^\* + (F\_W + F\_f - F\_c)R\_w \tag{22}
$$

where *v*∗ *<sup>x</sup>* denotes the target velocity. The feedforward controller is mainly used to compensate the control error caused by the nonlinearity of the system.

Substitution of (22) into (20) yields

$$\left(m + \frac{\sum I\_{wi}}{R\_w^2}\right)\dot{e}\_{v\_x} = \frac{\mu\_{fb}}{R\_w} + F\_d\tag{2.3}$$

where *evx* denotes the velocity tracking error, i.e., *evx* = *vx* − *v*<sup>∗</sup> *x*.

The feedback controller is designed by PID. Furthermore, defining the state vector *xl* = [, *<sup>t</sup>* <sup>0</sup> *evx dτ*, *evx* , . *evx* ] *T* , control vector *ul* = [*uf b*, . *uf b*] *<sup>T</sup>*, disturbance vector *dl* = [*Fd*, . *Fd*] *T*, then, (23) can be written in the state-space form.

$$\begin{cases}
\dot{\mathbf{x}}\_{l} = A\_{l}\mathbf{x}\_{l} + B\_{l}\boldsymbol{\mu}\_{l} + E\_{l}d\_{l} \\
y\_{l} = \mathbf{C}\_{l}\mathbf{x}\_{l}
\end{cases} \tag{24}$$

$$\begin{aligned} \text{where } A\_I &= \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}, B\_I = \begin{bmatrix} 0 & 0 \\ B\_s & 0 \\ 0 & B\_s \end{bmatrix}, \mathbf{C}\_I = \begin{bmatrix} 0 & 1 & 0 \end{bmatrix}, E\_I = \begin{bmatrix} 0 & 0 \\ E\_s & 0 \\ 0 & E\_s \end{bmatrix}, \\\ B\_s &= \frac{R\_w}{mR\_w^2 + \sum I\_{w'}}, \text{and } E\_s = \frac{R\_w^2}{mR\_w^2 + \sum I\_{w'}}. \end{aligned}$$

To solve the feedback PID controller, the following performance index function is constructed:

$$J\_{PID} = \int\_{0}^{\infty} \left( y\_l^T Q\_l y\_l + u\_l^T R\_l u\_l \right) dt \tag{25}$$

where *Ql* and *Rl* are weighting matrix, *Ql* <sup>=</sup> <sup>10</sup>3, *Rl* <sup>=</sup> *<sup>I</sup>*2×2.

Furthermore, the solution problem of the feedback PID controller can be transformed into the minimization of the performance index function, i.e.,

$$\min f\_{PID}(\mathbb{K})\tag{26}$$

Finally, the linear-quadratic optimization approach is used to solve the feedback PID controller [41,42].

#### *4.2. Path-Tracking Control Algorithm*

For the path-tracking controller design, the path-tracking model (13) is expressed in the discreate state-space form as follows.

$$\begin{array}{l} \mathbf{x}(k+1) = F(\mathbf{x}(k), \boldsymbol{\mu}(k)) \\ \mathbf{y}(k) = G(\mathbf{x}(k), \boldsymbol{\mu}(k)) \end{array} \tag{27}$$

where *F*(*x*(*k*), *u*(*k*)) = *x*(*k*) + *T f*(*x*(*k*), *u*(*k*)), *G*(*x*(*k*), *u*(*k*)) = *g*(*x*(*k*), *u*(*k*)), *T* denotes the sampling time, and *T* = 0.02*s*.

Based on the discreate model (27), NMPC is applied to the path-tracking controller design. The prediction horizon and the control horizon are defined by *Np* and *Nc*, *Np* ≥ *Nc*. *Np* = 10, and *Nc* = 5. Then, the predictive outputs are derived as follows.

$$\begin{array}{c} y(k+1) = G(x(k+1), u(k+1)) \\ y(k+2) = G(x(k+2), u(k+2)) \end{array}$$

$$\begin{array}{c} \vdots \\ y(k+N\_c) = G(x(k+N\_c), u(k+N\_c)) \\ y(k+N\_c+1) = G(x(k+N\_c+1), u(k+N\_c)) \\ \vdots \\ y(k+N\_p) = G(x(k+N\_p), u(k+N\_c)) \end{array} \tag{28}$$

Based on (28), this yields the output sequence as follows.

$$y(k+1) = \left[y(k+1), y(k+2), \dots, y(k+N\_p)\right]^T \tag{29}$$

Besides, the reference output sequence is expressed by

$$\hat{y}(k+1) = \left[\hat{y}(k+1), \hat{y}(k+2), \dots, \hat{y}(k+N\_p)\right]^T \tag{30}$$

where *<sup>y</sup>*ˆ(*<sup>k</sup>* <sup>+</sup> *<sup>p</sup>*) <sup>=</sup> [*ϕ*∗(*<sup>k</sup>* <sup>+</sup> *<sup>p</sup>*), *<sup>Y</sup>*∗(*<sup>k</sup>* <sup>+</sup> *<sup>p</sup>*)]*T*, *<sup>p</sup>* <sup>=</sup> 1, ··· , *Np*, *<sup>ϕ</sup>*∗(*<sup>k</sup>* <sup>+</sup> *<sup>p</sup>*) and *<sup>Y</sup>*∗(*<sup>k</sup>* <sup>+</sup> *<sup>p</sup>*) denote the reference values of yaw angle and lateral position.

Moreover, the control sequence is expressed as follows.

$$
\mu(k+1) = \left[\mu(k+1), \mu(k+2), \dots, \mu(k+N\_c)\right]^T \tag{31}
$$

The proposed path tracking controller aims to minimize the tracking error *y*(*k* + 1) − *y*ˆ(*k* + 1)<sup>2</sup> with the smallest control energy *u*(*k* + 1)<sup>2</sup>. Furthermore, the following cost function is constructed.

$$\begin{split} J(k) &= \sum\_{i=1}^{N\_p} \left[ y(k+i|k) - \mathfrak{H}(k+i|k) \right]^T Q \left[ y(k+i|k) - \mathfrak{H}(k+i|k) \right] \\ &\quad + \sum\_{i=0}^{N\_c-1} \left[ u(k+i|k) \right]^T \mathbb{R} \left[ u(k+i|k) \right] \end{split} \tag{32}$$

where *Q* and *R*arediagonalweightingmatrices, *Q* = diag <sup>8</sup> <sup>×</sup> <sup>10</sup>3, 104 , *R* = diag <sup>5</sup> <sup>×</sup> <sup>10</sup>5, 106 . Finally, the NMPC path-tracking controller can be solved with the following optimization.

$$\begin{array}{ll}\min\_{u(k)} & \\ \text{s.t.} & \\ & \mathbf{x}(k+i|k) = F(\mathbf{x}(k+i-1|k), u(k+i-1|k)) & \\ & u\_{\min} \le u(k+i|k) \le u\_{\max} \end{array} \tag{33}$$

#### *4.3. Active Fault-Tolerant Control Algorithm*

In this section, we only discuss the braking failure of IWMs. If IWMs have failure in the driving process, the AFTC mechanism is triggered immediately. After that, the AMR starts braking to guarantee safety. Therefore, we do not discuss the driving failure of IWMs independently.

Since the total torque of four wheels ∑ *Ti* has been worked out based on Section 4.1., it yields that

$$
\sum T\_i = T^{I \text{WMs}} + T^{E \text{MBs}} \tag{34}
$$

where *TIWMs* and *TEMBs* denote the total torques of four IWMs and four EMBs, respectively, i.e., *TIWMs* = *TIWM f l* + *<sup>T</sup>IWM f r* + *<sup>T</sup>IWM rl* + *<sup>T</sup>IWM rr* , *TEMBs* = *TEMB f l* + *<sup>T</sup>EMB f r* + *<sup>T</sup>EMB rl* + *<sup>T</sup>EMB rr* .

Besides, the external yaw moment is generated by IWMs and EMBs, i.e.,

$$
\Delta M\_z = \Delta M\_z^{IVMds} + \Delta M\_z^{EMBs} \tag{35}
$$

where Δ*MIWMs <sup>z</sup>* and Δ*MEMBs <sup>z</sup>* denote the external yaw moment generated by IWMs and EMBs, respectively.

To guarantee yaw stability, Δ*Mz* = 0. The following work aims to distribute the torque for each IWM and EMB based on (34) and (35). Figure 4 shows the AFTC flowchart to deal with all kinds of braking failure of IWMs.

**Figure 4.** AFTC flowchart for all kinds of braking failure of IWMs.

To maximize the regenerative braking energy, IWMs has higher braking priority than EMBs. Therefore, the first step is to determine if ! !*Fx* ! !<sup>≤</sup> *<sup>F</sup>IWM xmax* , *FIWM xmax* denotes the braking force boundaries of all normal IWMs, which is related to the failure number of IWM, i.e., *i* in Figure 4. If ! !*Fx* ! !<sup>≤</sup> *<sup>F</sup>IWM xmax* , *FIWM <sup>x</sup>* = *Fx*, else *FIWM <sup>x</sup>* = *FIWM xmax* and EMBs will compensate the rest braking force, i.e., *FEMB <sup>x</sup>* <sup>=</sup> *Fx* <sup>−</sup> *<sup>F</sup>IWM <sup>x</sup>* , where *FIWM <sup>x</sup>* and *FEMB <sup>x</sup>* denote the total braking force of four IWMs and four EMBs.

Since Δ*MIWM <sup>z</sup>* cannot be zero under some failure conditions, e.g., failure of two IWMs on the same side and failure of three IWMs, the generated <sup>−</sup>Δ*MIWM <sup>z</sup>* will be compensated by Δ*MEMB <sup>z</sup>* .

Once *FIWM <sup>x</sup>* , *FEMB <sup>x</sup>* , Δ*MIWM <sup>z</sup>* and Δ*MEMB <sup>z</sup>* are determined, the torque distribution algorithm will work to work out *TIWM f l* , *<sup>T</sup>IWM f r* , *<sup>T</sup>IWM rl* , *<sup>T</sup>IWM rr* , *TEMB f l* , *<sup>T</sup>EMB f r* , *<sup>T</sup>EMB rl* , *<sup>T</sup>EMB rr* . *FIWM <sup>x</sup>* and *FEMB <sup>x</sup>* can be derived from *TIWM* and *TEMB* based on (19).

For IWMs, the following torque distribution model is derived.

$$
\Lambda^{IVM} = \mathfrak{n}^{IVM} \Theta^{IVM} \tag{36}
$$

$$
\boldsymbol{\eta}^{IVM} = \begin{bmatrix}
\mathbf{1} & \mathbf{1} & \mathbf{1} & \mathbf{1} \\
\end{bmatrix} \boldsymbol{\lambda} \tag{37}
$$

$$
\lambda = \begin{bmatrix}
\lambda\_{f\bar{l}} & 0 & 0 & 0 \\
0 & \lambda\_{fr} & 0 & 0 \\
0 & 0 & \lambda\_{rl} & 0 \\
0 & 0 & 0 & \lambda\_{rr}
\end{bmatrix} \tag{38}
$$

$$
\lambda\_i = \begin{cases} 1, & \text{normal} \\ 0, & \text{failure of IWM}\_i \end{cases} \text{( $i = fl$ ,  $fr$ ,  $rl$ ,  $rr$ )}\tag{39}
$$

where Λ*IWM* = *TIWM*, Δ*MIWM z <sup>T</sup>* and <sup>Θ</sup>*IWM* = " *TIWM f l* , *<sup>T</sup>IWM f r* , *<sup>T</sup>IWM rl* , *<sup>T</sup>IWM rr* %*<sup>T</sup>* .

Based on (36), the weighted least squares (WLS) method is used to distribute the torques of IWMs. The cost function for IWM torque distribution is constructed as follows.

$$\begin{aligned} \Psi^{\text{IVMM}} &= \rho^{\text{IVMM}} \left\| \omega\_{\Lambda}^{\text{IVMM}} \left( \mathfrak{n}^{\text{IVMM}} \Theta^{\text{IVMM}} - \Lambda^{\text{IVMM}} \right) \right\|\_{2}^{2} + \left\| \omega\_{\Theta}^{\text{IVMM}} \left( \Theta^{\text{IVMM}} - \Theta\_{d}^{\text{IVMM}} \right) \right\|\_{2}^{2} \\ \text{s.t. } \ \Theta\_{\text{min}}^{\text{IVMM}} &\leq \Theta^{\text{IVMM}} \leq \Theta\_{\text{max}}^{\text{IWM}} \end{aligned} \tag{40}$$

where *ρIWM* denotes the weighting coefficient, which is usually set very large to minimize the torque distribution error, *ρIWM* = 106. Θ*IWM <sup>d</sup>* denotes the desired control vector, Θ*IWM <sup>d</sup>* = [0, 0, 0, 0] *<sup>T</sup>*. Θ*IWM* min and <sup>Θ</sup>*IWM* max denote the minimum and maximum control boundaries of Θ*IWM*, which is shown in Figure 5. *ωIWM* <sup>Λ</sup> and *ωIWM* <sup>Θ</sup> denote the weighting matrices. In this paper, *TIWM* and Δ*MIWM <sup>z</sup>* have the same allocation weights, i.e., *ωIWM* <sup>Λ</sup> = diag [1, 1], *TIWM <sup>i</sup>* (*i* = *f l*, *f r*, *rl*, *rr*) and *Fzi* are positively correlated, where *Fzi* denotes the vertical load of each wheel. Thus, *ωIWM* <sup>Θ</sup> <sup>=</sup> diag" <sup>1</sup> *Fzfl* , <sup>1</sup> *Fzfr* , <sup>1</sup> *Fzrl* , <sup>1</sup> *Fzrr* % .

**Figure 5.** Control boundaries of IWM.

Furthermore, (40) is rewritten as

$$\mathbf{W}^{\text{IVM}} = \left\| \underbrace{\left[ \underbrace{\left( \boldsymbol{\rho}^{\text{IVM}} \right)^{1/2} \omega\_{\Theta}^{\text{IVM}} \boldsymbol{\eta}^{\text{IVM}}}\_{\omega\_{\Theta}^{\text{IVM}}} \right]}\_{\mathbf{A}^{\text{IVM}}} \Theta^{\text{IVM}}} \odot \boldsymbol{\epsilon}^{\text{IVM}} - \underbrace{\left[ \underbrace{\left( \boldsymbol{\rho}^{\text{IVM}} \right)^{1/2} \omega\_{\Theta}^{\text{IVM}} \boldsymbol{\Lambda}^{\text{IVM}}}\_{\omega\_{\Theta}^{\text{IVM}} \Theta\_{d}^{\text{IVM}}} \right]}\_{\mathbf{B}^{\text{IVM}}} \right\|\_{2}^{2} \tag{41}$$

Then, the WLS method for IWMs torque distribution is described as follows.

$$\begin{array}{l}\underset{\Theta^{I\text{VM}}}{\min} \left\|\mathbf{A}^{I\text{VMM}}\Theta^{I\text{VMM}} - \mathbf{B}^{I\text{WM}}\right\|\_{2}^{2}\\\text{s.t. } \Theta^{I\text{VMM}}\_{\text{min}} \leq \Theta^{I\text{WM}} \leq \Theta^{I\text{WM}}\_{\text{max}}\end{array} \tag{42}$$

Based on (42), the torques for four IWMs, i.e., *TIWM f l* , *<sup>T</sup>IWM f r* , *<sup>T</sup>IWM rl* , *<sup>T</sup>IWM rr* , can be worked out.

For EMBs, the following torque distribution model is derived.

$$
\Lambda^{EMB} = \mathfrak{n}^{EMB} \ominus^{EMB} \tag{43}
$$

.

$$
\boldsymbol{\eta}^{EMB} = \begin{bmatrix}
\mathbf{1} & \mathbf{1} & \mathbf{1} & \mathbf{1} \\
\end{bmatrix} \tag{44}
$$

where Λ*EMB* = *TEMB*, Δ*MEMB z <sup>T</sup>* and Θ*EMB* = " *TEMB f l* , *<sup>T</sup>EMB f r* , *<sup>T</sup>EMB rl* , *<sup>T</sup>EMB rr* %*<sup>T</sup>*

Based on (43), the cost function for EMBs torque distribution is derived as follows.

$$\begin{aligned} \Psi^{EMB} &= \rho^{EMB} \left\| \omega^{EMB}\_{\Lambda} \left( \mathfrak{n}^{EMB} \Theta^{EMB} - \Lambda^{EMB} \right) \right\|\_{2}^{2} + \left\| \omega^{EMB}\_{\Theta} \left( \Theta^{EMB} - \Theta^{EMB}\_{d} \right) \right\|\_{2}^{2} \\ \text{s.t. } \ \Theta^{EMB}\_{\text{min}} &\leq \Theta^{EMB}\_{\text{max}} \leq \Theta^{EMB}\_{\text{max}} \end{aligned} \tag{45}$$

where *ρEMB* denotes the weighting coefficient, which is usually set very large to minimize the torque distribution error, *ρEMB* = 106, Θ*EMB <sup>d</sup>* denotes the desired control vector, Θ*EMB <sup>d</sup>* = [0, 0, 0, 0] *<sup>T</sup>*, Θ*EMB* min and <sup>Θ</sup>*EMB* max denote the minimum and maximum control boundaries of Θ*EMB*, Θ*EMB* min <sup>=</sup> <sup>−</sup>200, <sup>Θ</sup>*EMB* max = 0, *ωEMB* <sup>Λ</sup> and *ωEMB* <sup>Θ</sup> denote the weighting matrices, *ωEMB* <sup>Λ</sup> = diag [1, 1], *ωEMB* <sup>Θ</sup> <sup>=</sup> diag" <sup>1</sup> *Fzfl* , <sup>1</sup> *Fzfr* , <sup>1</sup> *Fzrl* , <sup>1</sup> *Fzrr* % .

Furthermore, (45) is rewritten as

$$\Psi^{EMB} = \left\| \underbrace{\left[ \underbrace{\left( \rho^{EMB} \right)^{1/2} \omega\_{\oplus}^{EMB} \eta^{EMB}}\_{\omega\_{\oplus}^{EMB}} \right]}\_{\Lambda^{EMB}} \Theta^{EMB} - \underbrace{\left[ \underbrace{\left( \rho^{EMB} \right)^{1/2} \omega\_{\oplus}^{EMB} \Lambda^{EMB}}\_{\omega\_{\oplus}^{EMB}} \right]}\_{\S^{EMB}} \right\|\_{2}^{2} \tag{46}$$

Then, the WLS method for EMB torque distribution is derived as follows.

$$\begin{array}{ll}\underset{\Theta^{EMB}}{\min} \left\| \mathbf{A}^{EMB} \Theta^{EMB} - \mathbf{B}^{EMB} \right\|\_{2}^{2} \\ \text{s.t. } \Theta^{EMB}\_{\text{min}} \le \Theta^{EMB} \le \Theta^{EMB}\_{\text{max}} \end{array} \tag{47}$$

Based on (47), the torques for four IWMs, i.e., *TEMB f l* , *<sup>T</sup>EMB f r* , *<sup>T</sup>EMB rl* , *<sup>T</sup>EMB rr* , can be worked out.

#### **5. Simulation Results and Analysis**

Three simulation cases were designed and carried out via the co-simulation platform based on Carsim and Simulink as shown in Figure 6. Figure 6a shows the Simulink algorithm structure in the co-simulation platform, including the path-tracking control algorithm, longitudinal velocity-tracking control algorithm and the AFTC algorithm. All the control algorithms were carried out in the Simulink software. The real AMR model was built in Carsim software. With the co-simulation of Carsim and Simulink, the effectiveness and feasibility of the proposed algorithm were verified. Figure 6b shows the simulation scenario in Carsim.

**Figure 6.** Co-simulation platform based on Carsim and Simulink: (**a**) Control algorithm; (**b**) simulation scenario.

#### *5.1. Simulation Case 1*

In this case, a straight-line braking condition was carried out. The AMR accelerated to 15 m/s and then started to brake after the 10th second. Three kinds of braking modes were compared in this case, i.e., regenerative braking (IWM), mechanical braking (EMB) and hybrid braking (IWM + EMB). The three kinds of braking modes were realized based on the same AMR with the parameters in Table 1 and the same simulation platform in Figure 6. The same velocity-tracking control algorithm and path-tracking control algorithm were utilized. In this case, braking failure was not considered.


**Table 1.** AMR parameters for simulation.

The path lengths of AMR with different kinds of braking modes are illustrated in Figure 7. It was found that regenerative braking had the longest braking distance. The second was mechanical braking, and the shortest was hybrid braking. A detailed analysis is shown in Table 1. The braking distances for the three kinds of braking modes were 42.74 m, 33.15 m, 27.33 m, respectively, and the braking times for the three kinds of braking modes were 4.04 s, 3.09 s, 2.32 s, respectively. Figure 8 shows the velocities of AMR with different kinds of braking modes. Hybrid braking showed the largest deceleration among the three kinds of braking modes. It can be concluded that hybrid braking can shorten the braking distance and braking time remarkably, improving braking safety.

**Figure 7.** Path length of AMR in Case 1.

**Figure 8.** Velocity of AMR in Case 1.

Regenerative braking powers with different kinds of braking modes are depicted in Figure 9. Mechanical braking cannot recover braking energy. Regenerative braking has larger regenerative braking power than hybrid braking. As shown in Table 2 regenerative braking energies for regenerative braking, mechanical braking, and hybrid braking were 6.10 <sup>×</sup> 104 J, 0 J, and 3.29 <sup>×</sup> <sup>10</sup><sup>4</sup> J, respectively. Due to the application of EMB in hybrid braking, the hybrid braking mode had smaller regenerative braking energy than the regenerative braking mode.

**Figure 9.** Regenerative braking power of AMR in Case 1.



The wheel torques of AMR for three kinds of braking modes are displayed in Figures 10–12, respectively. In the regenerative braking mode, only IWMs worked, in charge of both drive and control. In the mechanical braking mode, IWMs were only used for drive, and EMBs were used for braking. Therefore, the torques of IWMs changed to zero after 10th second. In the hybrid braking mode, both IWMs and EMBs were used for braking. EMBs could compensate the rest braking force for IWMs, shortening the braking time and braking distance.

**Figure 10.** Wheel torques of AMR with regenerative braking in Case 1.

**Figure 11.** *Cont*.

**Figure 11.** Wheel torques of AMR with mechanical braking in Case 1: (**a**) IWM; (**b**) EMB; (**c**) sum.

**Figure 12.** Wheel torques of AMR with hybrid braking in Case 1: (**a**) IWM; (**b**) EMB; (**c**) sum.

From the above simulation results, it can be seen that the regenerative braking mode was beneficial to braking energy recovery. However, it led to longer braking distance, which reduces braking safety. The mechanical braking mode could shorten the braking distance but not recover the braking energy. In general, the hybrid braking mode had the advantages of the above two kinds of braking modes, i.e., maximizing the regenerative braking efficiency and advancing the braking safety.

#### *5.2. Simulation Case 2*

This case aimed to validate the AFTC algorithm for the AMR on a curved road; the hybrid braking mode was used. The AMR accelerated to 20 m/s and then started to brake after the 12th second. However, failure of the FL IWM occurred at the 10th second and failure of the RL IWM at the 12th second.

Figure 13 shows the path-tracking results of AMR under three kinds of conditions, i.e., normal (no failure), failure (without AFTC), and AFTC. It can be seen from Figure 13b that without AFTC, the AMR departed from its target path after braking failure, showing a large lateral offset. With AFTC, the AMR could realize lane-keeping after the braking failure and brake safely until stopped, as in the normal condition. The steering angles of AMR are illustrated in Figure 14. After the braking failure of IWMs, the AMR showed very

large steering angles to realize lane-keeping when without AFTC. However, with AFTC, the AMR could use torque redistribution to guarantee brake safety and lateral stability.

**Figure 13.** Path-tracking result of AMR in Case 2: (**a**) moving trajectories; (**b**) lateral offset.

**Figure 14.** Steering angles of AMR in Case 2: (**a**) braking failure; (**b**) AFTC.

The velocities of AMR under three kinds of conditions are depicted in Figure 15. Due to the loss of stability, the simulation was stopped at the 12.6 s when without AFTC. The AMR could not finish the braking process after the braking failure of the IWMs. With AFTC, the AMR could realize safe braking as in the normal condition.

**Figure 15.** Velocity of AMR in Case 2.

The regenerative braking results are shown in Figure 16 and Table 3. In spite of the braking failure, the AFTC algorithm could help the AMR recover the braking energy up to 3.43 <sup>×</sup> 104 J. Due to the braking failure of FL and RL IWMs, the recovered braking energy was smaller than in the normal condition.

**Figure 16.** Regenerative braking power of AMR in Case 2.



The wheel torques of AMR with failure and with AFTC are illustrated in Figures 17 and 18, respectively. Due to the failure of FL and RL IWMs, the torques of the two IWMs changed to zero after the 10th second and the 12th second, respectively. Without AFTC, the AMR could not adjust its torque distribution to guarantee lateral stability. However, with AFTC, the EMBs redistributed the brake torque to compensate the braking force and overcome the external yaw moment caused by the braking failure of IWMs (Figure 18a,b).

**Figure 17.** Wheel torques of AMR with failure in Case 2: (**a**) IWM; (**b**) EMB; (**c**) sum.

**Figure 18.** Wheel torques of AMR with AFTC in Case 2: (**a**) IWM; (**b**) EMB; (**c**) sum.

#### *5.3. Simulation Case 3*

In this case, the braking failure condition of three IWMs was studied, further validating the effectiveness of the AFTC algorithm. The AMR accelerated to 20 m/s and then started to brake after the 12th second. However, the FL IWM had a failure at the 10th second, and the RL and RR IWMs had a braking failure at the 12th second.

The path-tracking results of the AMR in this case are illustrated in Figure 19. This was similar to Case 2 in that without AFTC, the AMR departed from its original trajectory and lost stability after the braking failure of the IWMs. Moreover, the lateral offset was larger than that in Case 2. In spite of the increased failure numbers of IWMs, AFTC can help the AMR realize lane-keeping and safe braking. Figure 20 shows the steering angels of the AMR. It was found that the AMR had very large steering angles after the braking failure of IWMs, reaching the control boundaries. Despite this, the AMR could not guarantee stability and braking safety.

**Figure 19.** *Cont*.

**Figure 19.** Path-tracking result of AMR in Case 3: (**a**) moving trajectories; (**b**) lateral offset.

**Figure 20.** Steering angles of AMR in Case 3: (**a**) braking failure; (**b**) AFTC.

Figure 21 shows the velocities of AMR under different conditions. Under the failure condition, the simulation was stopped at the 14.1 s due to the loss of stability of the AMR. However, the AFTC algorithm could help AMR address the braking failure of IWMs and finish the braking process safely.

**Figure 21.** Velocity of AMR in Case 3.

The regenerative braking results of AMR are shown in Figure 22 and Table 4. In spite of the braking failure of three IWMs, the AFTC algorithm could help AMR recover braking energy up to <sup>−</sup>1.72 <sup>×</sup> <sup>10</sup><sup>4</sup> J using the normal IWM.

**Figure 22.** Regenerative braking power of AMR in Case 3.


The wheel torques of AMR under the failure condition and the AFTC condition are displayed in Figures 23 and 24, respectively. After the braking failure of three IWMs, the original torque distribution algorithm could not guarantee stability and braking safety. However, AFTC could help redistribute the torque of the normal IWM and four EMBs, recovering braking energy and guaranteeing braking safety and stability.

**Figure 23.** Wheel torques of AMR with failure in Case 3: (**a**) IWM; (**b**) EMB; (**c**) sum.

**Figure 24.** *Cont*.

**Figure 24.** Wheel torques of AMR with AFTC in Case 3: (**a**) IWM; (**b**) EMB; (**c**) sum.

#### **6. Conclusions**

A chassis control framework was designed for an AMR. To address the braking failure of IWMs, an AFTC algorithm was studied by redistributing the braking torques of normal IWMs and four EMBs. Torque redistribution was carried out based on the WLS method. Three simulation cases were conducted to evaluate the feasibility and effectiveness of the proposed control algorithms. The simulation results indicate that the hybrid braking mode can help AMR recover the braking energy and advance braking safety. Moreover, the AFTC algorithm can deal with the braking failure of IWMs and realize braking energy recovery at the same time.

The hybrid conditions of IWM braking failure and EMB braking failure will be studied in future work.

**Author Contributions:** Writing—original draft preparation, P.H.; writing—review and editing, B.L.; supervision, C.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by the Agency for Science, Technology and Research (A\*STAR) under its IAF-ICP Programme ICP1900093 and the Schaeffler Hub for Advanced Research at NTU.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy reason.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Generalized Single-Vehicle-Based Graph Reinforcement Learning for Decision-Making in Autonomous Driving**

**Fan Yang 1, Xueyuan Li 1,\*, Qi Liu 1, Zirui Li 1,2 and Xin Gao <sup>1</sup>**


**Abstract:** In the autonomous driving process, the decision-making system is mainly used to provide macro-control instructions based on the information captured by the sensing system. Learningbased algorithms have apparent advantages in information processing and understanding for an increasingly complex driving environment. To incorporate the interactive information between agents in the environment into the decision-making process, this paper proposes a generalized singlevehicle-based graph neural network reinforcement learning algorithm (SGRL algorithm). The SGRL algorithm introduces graph convolution into the traditional deep neural network (DQN) algorithm, adopts the training method for a single agent, designs a more explicit incentive reward function, and significantly improves the dimension of the action space. The SGRL algorithm is compared with the traditional DQN algorithm (NGRL) and the multi-agent training algorithm (MGRL) in the highway ramp scenario. Results show that the SGRL algorithm has outstanding advantages in network convergence, decision-making effect, and training efficiency.

**Keywords:** autonomous driving; decision-making; graph convolution; deep reinforcement learning

#### **1. Introduction**

In autonomous driving, the decision-making system is mainly used to produce advanced actions of vehicles, such as lane changing, acceleration, braking, and so on. Tactical decision-making for autonomous driving is challenging due to the diversity of environments, the uncertainty in the sensor information, and the complex interaction with other road users [1,2]. Traditional vehicle trajectory modeling and tracking control methods, such as genetic algorithms, neural networks, and their optimizations, have played a positive role in the research of decision-making [3].

The operational space of an autonomous vehicle (AV) can be diverse and vary significantly. Due to this, formulating a rule-based decision-maker for selecting driving maneuvers may not be ideal [4]. With the development of deep learning, the domain of reinforcement learning (RL) has become a robust learning framework now capable of learning complex policies in high-dimensional environments [5]. Therefore, using reinforcement learning to solve decision-making problems has gradually become the mainstream of research. Carl-Johan Hoel et al. introduce a method based on deep reinforcement learning for automatically generating a general-purpose decision-making function [6]. They trained an RL agent to handle a truck–trailer combination's speed and lane change decisions in a simulated environment. Hongbo Gao et al. solved sequential decision optimization problems based on the inverse reinforcement learning algorithm, and the proposed method was verified in terms of efficiency [7]. Some algorithms for planning and decision-making are based on analyzing driver behavior [8,9].

**Citation:** Yang, F.; Li, X.; Liu, Q.; Li, Z.; Gao, X. Generalized Single-Vehicle-Based Graph Reinforcement Learning for Decision-Making in Autonomous Driving. *Sensors* **2022**, *22*, 4935. https://doi.org/10.3390/s22134935

Academic Editors: Yafei Wang, Chao Huang, Peng Hang, Zhiqiang Zuo and Bo Leng

Received: 24 May 2022 Accepted: 28 June 2022 Published: 29 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

The realization of the decision-making is based on the understanding and analysis of high-dimensional environmental information. However, the traditional reinforcement learning algorithm only has a good capacity for decision-making based on low-dimension features [10]. It will have the problem of insufficient understanding in the face of more complex scenario information. The deep neural network (DNN) has a strong ability to learn representations and for the generalization of matching patterns from high-dimensional data. Therefore, deep reinforcement learning (DRL) algorithms are effective in tasks requiring feature representation and policy learning, e.g., autonomous driving decision-making [11]. Using the functional approximation ability of a deep neural network (DNN), an intelligent controller integrating artificial intelligence technologies such as deep learning (DL) and reinforcement learning (RL) is designed to maintain and avoid obstacles in lanes [12–14], decision-making [15–21], longitudinal control [22], merger maneuvers [23], human-like driving strategies [24–26], and other large-scale autonomous driving control tasks. Yingjun Ye et al. put forward a framework for decision-making training and learning. It consists of a deep reinforcement learning (DRL) training program and a high-fidelity virtual simulation environment [27]. Compared with the DQN algorithm based on a value function, the deep deterministic policy gradient (DDPG) algorithm based on an action policy can solve the continuity problem of the action space. Haifei Zhang et al. used the DDPG algorithm to solve the control problem of automatic driving based on a reasonable reward function, deep convolution network, and exploration policy [28].

Interaction between vehicles in common public transport scenarios is necessary and pervasive. However, there are relatively few studies on how autonomous vehicles interact in public environments with reinforcement learning. Realizing coordination between vehicles in a shared environment is challenging due to the unique feature of vehicular mobility, which makes it infeasible to apply the existing reinforcement learning methods directly. Chao Yu et al. proposed using a dynamic coordination graph to model the continuously changing topology during vehicles' interactions and developed two basic learning approaches to coordinate the driving maneuvers for a group of vehicles [29].

Cooperative vehicle–infrastructure systems and automatic driving technology are developing rapidly [30]. The graph neural network (GNN) has gained increasing popularity in various domains, including social network analysis [31,32]. GNN can extract the relational data representations and generate useful node embeddings on the node features and the features from neighboring nodes. The interactions between the ego vehicle and other surrounding vehicles can also be represented by the dynamic potential field (DPF) and embedded in the gap acceptance model to ensure safety and personalization during driving [33]. Jiqian Dong et al. proposed a novel deep reinforcement learning (DRL)-based approach combining the graphic convolution neural network (GNN) and deep Q network (DQN), namely the graphic convolution Q network, as the information fusion module and decision processor [34]. The proposed model can aggregate the information obtained by collaborative perception and output collaborative lane change decisions for multiple vehicles. Even in the case of highly dynamic and partially observed mixed traffic, the intention can be satisfied.

However, the above multi-agent training-based GNN reinforcement learning (MGRL) has the following problems in the actual verification process (the scenario of highway ramp exiting):


Given the above problems, this paper proposes an improved single-agent GNN-based RL algorithm (SGRL algorithm). This paper has the following contributions:


The paper is organized as follows. Section 2 introduces the proposed SGRL algorithm in detail. Section 3 shows the model training and testing of the SGRL algorithm and comparison algorithms (MGRL and NGRL). Section 4 shows the results of training and testing and analyzes the comparison. Finally, Section 5 derives the conclusions and proposes future improvement directions.

#### **2. Method**

The proposed SGRL methods are used to model the Markov Decision Process (MDP). The agents can explore the environment by observing states, taking actions, and receiving rewards, as shown in Figure 1.

**Figure 1.** Markov Decision Process.

The implementation core of the SGRL method is based on the preprocessing of input data, the structure of the deep neural network, and the setting of the output end.

#### *2.1. Network Input*

In each time step in the training process, the vehicles in the scenario can be divided into human-driven vehicles (HVs) and autonomous vehicles (AVs). At each step *t*, the input of the training model can be divided into two parts: feature matrix *Xt* and correlation matrix *Ct*. The state *st* = (*Xt*, *Ct*). Concerning the feature matrix *Xt*, the necessary basic information based on highway scenarios is included. The total number of human-driven vehicles (HVs) and autonomous vehicles (AVs) is set to *N* = *NHV* + *NAV*. Then, the specification of the matrix is *N* × 8. The eight characteristic parameters of each vehicle describe the speed, lateral position, longitudinal position, and destination information, denoted as *xi* = (*vi*, *pi*, *li*, *Ii*), To serialize multiple data at the same scale, the parameters are set as follows:


Based on GNN, we can introduce a correlation matrix *Ct* to calculate the relationship between vehicle nodes. Considering that the sensors installed on autonomous vehicles have fixed sensing ranges, only vehicles within the sensor sensing range are recorded in the correlation matrix. *Ct* is a square matrix with the specification *N* × *N*. The row data *i* of the matrix represent the relationship between vehicle *i* and other vehicles through binary data. The value *Cij* of the vehicle *j* within the set distance of the target vehicle *i* will be set to 1, as shown in Figure 2.


**Figure 2.** The schematic diagram of the correlation matrix setting. Each row in the matrix represents the correlation between a vehicle and all other vehicles, specifically the logical values of 0 and 1, where 0 represents the actual distance greater than the set distance, and 1 represents the actual distance less than the set distance.

Considering that the number of vehicles in the scene is dynamic, this situation has been considered when setting the feature matrix and the correlation matrix. The vehicles in the environment are divided into HVs and AVs, located in the feature matrix's upper and lower parts. When the number of vehicles is less than N, the positions in the matrix are occupied by 0.

To ensure the authenticity of the simulation process, all vehicles appear and disappear successively in the scenario, so the feature information and correlation information obtained at the step *t* cannot reach the predetermined matrix size. To ensure the standard calculation of GNN, matrix data filling is needed. Matrix segmentation prevents data confusion and error correspondence caused by random packing, as shown in Figure 3.

**Figure 3.** Matrix segmentation diagram. Each row in the matrix represents the characteristic information of a vehicle, and the matrix is divided into the upper and lower parts to separate AV and HV. The green part indicates that the vehicle is not currently in the scenario.

#### *2.2. Network Structure*

In the whole network structure, the function of graph convolution is to obtain the interaction information between vehicles. The function of the full connection layer is to parse the matrix information. Each newly obtained matrix in the network needs to be input into the full connection layer for recording and information analysis. In addition, the number of nodes in each layer also plays a decisive role in the model's performance. The optimal number of nodes is selected through comparative experiments.

Data records for comparative tests are shown in Table 1.

**Table 1.** Training effect for different numbers of nodes.


At each step *t*, the feature matrix *Xt* is first fed to a Fully Connected Network (FCN) encoder *<sup>ψ</sup>* and then the matrix *Ht* <sup>∈</sup> *<sup>N</sup>*×<sup>128</sup> is obtained (Equation (1)). The original eight characteristic values will be recoded to 128 values. FCN will be executed twice consecutively.

$$H\_t = \psi(X\_t) \in \prescript{N \times 128}{}\tag{1}$$

Next is the calculation of graph convolution. The input of the convolution function is the newly obtained feature matrix *Ht* and correlation matrix *Ct*. The function output matrix *Kt* has the same specification as the original matrix *Ht* (Equation (2)).

$$\mathbb{K}\_t = \chi(H\_t, \mathbb{C}\_t) \in \mathbb{R}^{N \times 128} \tag{2}$$

The original feature information and the new information obtained by graph convolution are fused by matrix stitching online as the total input of the subsequent neural network *η*.

The feature data density changes at each stage are as follows:


The overall structure of the SGRL algorithm is shown in Figure 4.

**Figure 4.** The overall structure diagram of the model. FCN represents the full connection layer, and GNN represents the graph neural network.

#### *2.3. Network Output*

Since the vehicle longitudinal control model built into the simulation environment is still rule-based, it cannot incorporate the interaction between vehicles into the control model. The SGRL algorithm fuses longitudinal and lateral control into the same model by increasing the output dimension, which significantly simplifies the complexity of the control process and solves the coupling problem of two directions.

The SGRL algorithm is trained based on DQN, so the output action space can only be discrete. For AVs, the longitudinal control is mainly reflected in the acceleration, and the lateral control is primarily reflected in the lane change direction. The improvement of the output action resolution is conducive to improving the control sensitivity, but it also increases the computational complexity and reduces the control frequency. In the algorithm of this paper, a compromise solution is selected. The longitudinal control is set to 11 discrete values in the interval [−5, 5], and the lateral control is set to three actions: keeping, turning left and turning right. Thus, the output matrix of the model is set as *At* <sup>∈</sup> <sup>R</sup>*N*×33, as shown in Figure 5.


**Figure 5.** Model action space diagram. The matrix row direction divides the longitudinal acceleration into discrete values, and the matrix column direction represents the lateral lane change of the vehicle.

#### *2.4. Reward Function*

The setting of the reward function needs clear guidance and a strong correlation with training objectives. In the SGRL algorithm, the task success rate, security, and traffic efficiency should be considered simultaneously. The corresponding reward values are set as intention reward, crash reward, and speed reward.

#### 2.4.1. Intention Reward

To guide autonomous vehicles to complete driving tasks from the corresponding ramps out of the highway, the SGRL algorithm constructs reward gradients for different lanes, as shown in Figure 6. Merge\_0 and merge\_1, respectively, refer to the autonomous vehicles that need to drive away from ramp\_0 and ramp\_1 according to the task requirements. In the simulation experiment, the driving route of all autonomous vehicles is entirely determined by the reinforcement learning controller. Therefore, vehicles belonging to the merge\_0 category will not necessarily exit from ramp\_0; that is, they cannot complete the driving task.

**Figure 6.** Intention reward gradient diagram. The task completion of autonomous vehicles is seen as a factor in judging the quality of the current reinforcement learning model, which is manifested in the reward values that can be obtained when each vehicle is in different driving sections and lanes. The strip area represents the reward types that the corresponding vehicles can obtain from the area. Green represents reward (1 × *RI*−*Base*), blue represents punishment (−1 × *RI*−*Base*), and orange represents serious punishment (−2 × *RI*−*Base*). *RI*−*Base* is set to 1 for normalization.

To ensure practical guidance for all locations, the reward value *RI*−*<sup>t</sup>* for the fixed area is set to a constant value. To ensure the interaction between various rewards and to pass it to the training process, it is necessary to pay attention to the consistency of the numerical scale when setting *RI*−*t*. Moreover, the *RI*−*<sup>t</sup>* needs to distinguish between positive and negative values, which is more conducive to the training. In addition, if the autonomous vehicle completes the task, it can obtain greater *RI*−*t*, and if the task fails, it will also obtain a greater negative *RI*−*t*.

#### 2.4.2. Crash Reward

The safety of autonomous driving is the basis of all other characteristics. The simulation platform SUMO can detect collisions at step *t* and output the number *Ncollision*−*<sup>t</sup>* of vehicles involved in collisions. The collision reward is calculated according to *Ncollision*−*<sup>t</sup>* (Equation (3)).

$$R\_{\mathbb{C}-t} = -N\_{collision-t}/2\tag{3}$$

#### 2.4.3. Speed Reward

To control the consistency of the reward scale, the speed reward value *RS*−*<sup>t</sup>* needs to be normalized. *<sup>v</sup>*max = max(*v*max <sup>−</sup>*vehicle*, *<sup>v</sup>*max <sup>−</sup>*highway*) is the max speed for the AV. To ensure the positive and negative values of *RS*−*t*, the calculation is shown as Equation (4).

$$R\_{S-t} = \frac{v\_{i-t}}{v\_{\text{max}}} - 0.3\tag{4}$$

#### 2.4.4. Total Reward

The general method to calculate the total reward is a weighted summation of each component. Direct addition will lead to mutual coverage of rewards, resulting in poor training effects. The SGRL algorithm uses a new aggregation method, as shown in Equation (5).

$$R\_I = \begin{cases} \;\omega\_I \times R\_{I-t} \times R\_{S-t} + \omega\_C \times R\_{C-t} & (R\_{I-t} > 0) \\ \;\omega\_I \times R\_{I-t} + \omega\_C \times R\_{C-t} & (R\_{I-t} \le 0) \end{cases} \tag{5}$$

This calculation method takes the task success rate and safety as the primary considerations and considers traffic efficiency simultaneously. The problem of vehicle parking for intention rewards can be completely avoided. Moreover, the weight relationship between rewards can be adjusted by parameters *ω<sup>I</sup>* and *ωC*. The parameter settings of this paper are *ω<sup>I</sup>* = 1, *ω<sup>C</sup>* = 2.

#### *2.5. Model Training and Testing*

The most prominent feature of the SGRL algorithm is training for a single agent, but the obtained model can be applied to a multi-agent environment. For trained agents, the vehicles around them can be considered the same category—surrounding vehicles. Therefore, the work done by the GNN-based reinforcement learning model can be interpreted as planning and decision-making based on the characteristics and relationships of the target agent and its surrounding vehicles. The model obtained by single-agent training can be directly transplanted to other agents in the same scenario, as shown in Figure 7.

**Figure 7.** Reinforcement learning model transplant diagram.

To prevent the overfitting of the reinforcement learning process, we randomize the distribution of training vehicles and the task of autonomous vehicles in each episode based on fixed rules, as shown in Figure 8.

**Figure 8.** Scenario random setting diagram.

Algorithm 1 shows the detailed steps of training.


Algorithm 2 shows the detailed steps of testing.


Initialize the simulation environment ## Testing step ## For step *t* = 1 to *T*(total test steps) **do** State decoding: (*Xt*, *Ct*) = *st* For AV *i* = 1 to *NAV* **do** Move other AV's features to the back of HV: *Xt*<sup>0</sup> <sup>=</sup> *<sup>λ</sup>*(*Xt*) <sup>∈</sup> <sup>R</sup>*N*×<sup>8</sup> Calculate Q based on trained model: <sup>∧</sup> *<sup>Q</sup><sup>θ</sup>* (*st*) = *<sup>π</sup>SGRL*(*Xt*0, *Ct*) <sup>∈</sup> <sup>R</sup>*N*×<sup>33</sup> Select *a*∗ *<sup>t</sup>*−*<sup>i</sup>* <sup>=</sup> arg max <sup>∧</sup> *Q<sup>θ</sup>* (*st*) Get the action matrix *a*∗ *<sup>t</sup>* = (*a*<sup>∗</sup> <sup>1</sup>, *a*<sup>∗</sup> <sup>2</sup>,...... *a*<sup>∗</sup> *NAV* ) Execute *a*∗ *<sup>t</sup>* and get a new state *st*+<sup>1</sup> Set *st* = *st*+<sup>1</sup>

#### **3. Simulation**

*3.1. Baseline Models*

Two baseline models are introduced for comparative analysis in the simulation: the traditional DQN algorithm (NGRL) and the multi-agent training algorithm (MGRL). In the NGRL algorithm, the GNN part is removed, and the correlation matrix is used as input by splicing with the feature matrix.

The model of the MGRL algorithm is consistent with the SGRL proposed in this paper in terms of network structure, but its training process is based on multi-agent environment interaction.

By comparing the three models, the specific effects of the GNN structure and singleagent training of SGRL can be effectively analyzed. The simulation scenario is shown in Figure 9.

**Figure 9.** Simulation scenario diagram.

#### *3.2. Simulator Parameters*

The simulation scenario is a long highway with three lanes. There are exit ramps at one-third and two-thirds of the total length respectively. The speed limit for the whole road is set as 20 m/s (76 km/h) for all the vehicles. The AVs and HVs are put into the scenario at the probability of 0.1 and 0.4 from the left side of the road. And the initial speed and lane position are random.

There are six types of vehicles in the scenario, including HVs that travel straight through the highway, vehicles (HVs and AVs) that want to exit from ramp\_0 and vehicles (HVs and AVs) that want to exit from ramp\_1. The simulation environment controls the driving of HVs, and the AVs are completely controlled by the reinforcement learning model SGRL in real time.

The number of vehicles in the experiment is set as shown in Table 2.


**Table 2.** Number setting of experimental vehicles.

#### **4. Results and Discussion**

#### *4.1. Training Results*

All three models were trained for 1000 episodes. The symbolic data of the training process are the reward, average Q value and loss value. Average rewards are obtained by averaging rewards against steps. The specific changes are shown in Figures 10–13.

For the comparison of reward values in the training process, under the same reward value calculation, the reward values and average reward values of the SGRL algorithm can converge faster and have better final convergence.

The changing trend of the Q value and loss value of the SGRL algorithm in the training process is consistent with the learning process. According to the loss results, SGRL has a faster convergence speed.

**Figure 10.** Diagram of average reward. This value is the average of each step reward value.

**Figure 11.** Diagram of reward. This value is the accumulation of each single step reward value.

**Figure 12.** Diagram of average Q. This value is a training mark value in reinforcement learning.

**Figure 13.** Diagram of loss. This value represents the difference between the real network and the ideal network.

To compare the task success rate, security, and traffic efficiency, the average velocity, number of collisions, success rate, and average steps per episode must be collected. The results of the above training data are shown in Figures 14–17.

**Figure 14.** Diagram of success rate. This value is obtained by the ratio of the number of vehicles completing the task (entering the corresponding ramp) to the total number.

**Figure 15.** Diagram of collisions. This value is the number of collisions between vehicles obtained by real-time detection in the simulation scenario.

**Figure 16.** Diagram of average velocity. This value is the average velocity of all AVs in the scenario.

**Figure 17.** Diagram of average steps. This value is the number of steps experienced at the end of each episode.

The SGRL algorithm has obvious advantages regarding task success rate and average vehicle speed. In terms of collision times and average training step length, SGRL also meets driving safety requirements.

In addition, SGRL has apparent advantages in training efficiency under the premise of the same training episodes. The specific hardware parameters and training time are shown in Table 3. The comparison of the training time is shown in Table 4.

**Table 3.** Computer hardware information.


**Table 4.** Training time statistics (ten experiments for each algorithm; time unit is hours).


#### *4.2. Testing Results*

The trained model needs to be verified by the test process. To fully verify the algorithm's effectiveness, we adjust the total length of the road under the premise of the same traffic flow (20 vehicles per episode). The three algorithms are tested on 1000 m, 750 m and 500 m roads to simulate different traffic flow and congestion levels.

Each test process includes 1000 episodes. In the test process, the reward value can still be used as an essential evaluation of model performance. The simulation results of reward value and average reward value are shown in Figures 18 and 19.

**Figure 18.** Diagram of testing reward. This value is the average of the total reward value for each episode in the test.

**Figure 19.** Diagram of testing average reward. This value is the average of the average reward value of each episode in the test.

For the most complex and congested 500 m highway scenario, the longitudinal motion spatial distribution of the three algorithms throughout the test cycle is as shown in Figure 20.

**Figure 20.** Spatial distribution diagram of longitudinal movement. Based on the frequency statistics of each action output in each testing process, the probability distribution of the longitudinal action can be obtained through probability calculation. The data in the figure are obtained from the average of ten repeated experiments.

The longitudinal control of autonomous vehicles will become more and more complex with the increase in road congestion, so the test results of the 500 m scene are the optimal reference. The test results show that the action output of the SGRL algorithm is mainly concentrated near 0, and the probability of large acceleration is acceptable.

To compare the task success rate, security, and traffic efficiency, the average velocity, number of collisions, success rate, and average step per episode must be collected.

The data of different methods are listed in Table 5, and the mean of the above data is shown in Figures 21–24.


**Table 5.** Performance comparison for different models.

**Figure 21.** Diagram of testing average steps. This value is the number of steps experienced at the end of each episode.

**Figure 22.** Diagram of testing success rate. This value is obtained by the ratio of the number of vehicles completing the task (entering the corresponding ramp) to the total number.

**Figure 23.** Diagram of testing collisions. This value is the number of collisions between vehicles obtained by real-time detection in the simulation scenario.

**Figure 24.** Diagram of testing average velocity. This value is the average velocity of all AVs in the scenario.

It can be seen from the figure and table that SGRL has apparent advantages in task success rate and average velocity. In terms of the number of collisions and the test steps, although the SGRL value is not the best, it is consistent with the best value.

#### *4.3. Results Discussion*

It can be seen that the SGRL algorithm has outstanding advantages over the MGRL algorithm and the NGRL algorithm. The SGRL algorithm can converge faster during the training process and achieve better data performance. In the testing process, the SGRL algorithm has outstanding data performance in terms of task success rate, average vehicle speed, and security.

The MGRL algorithm directly sums the reward value of all autonomous vehicles in the process of reward value calculation, so there is a phenomenon in which the excellent performance of vehicle behavior and the poor performance of vehicle behavior offset each other, which is not conducive to the adequate updating of parameters, and also causes the final convergence speed to be slow, and thus the convergence effect is not good.

The NGRL algorithm lacks the calculation consideration of the interaction process between vehicle individuals. Therefore, in the decision-making process, due to the relatively simple understanding of the environment, it cannot obtain sufficient data support, so the data performance is poor.

The SGRL algorithm considers and solves the above problems and optimizes the network structure. According to the comparison of data, it can be verified that the improvement of SGRL is obviously effective.

#### **5. Conclusions**

This paper proposes a generalized single-vehicle-based graph neural network reinforcement learning algorithm (SGRL algorithm). This algorithm combines GNN with deep reinforcement learning to solve the vehicle planning problem in the scenario of highway driving out of the ramp. The SGRL model is trained for a single agent and can be tested in multi-agent scenarios. At the same time, the algorithm sets up an improved reward function to provide a clear direction for training.

Comparing the three algorithms, the conclusions are as follows:


In future research, the continuity of model action space can be added to this algorithm, which will effectively improve the driving fluency of vehicles. In addition, the relationship between multiple agents in the scenario should not be limited to physical characteristics: decision-making, driving intention, and task priority can also be incorporated into the calculation process.

**Author Contributions:** Conceptualization, F.Y. and Q.L.; Data curation, F.Y.; Methodology, F.Y. and Z.L.; Project administration, X.L.; Resources, X.L.; Software, F.Y., Q.L. and X.G.; Supervision, X.L.; Validation, F.Y.; Visualization, F.Y.; Writing—original draft, F.Y.; Writing—review & editing, X.L. and Q.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Multi-Agent Decision-Making Modes in Uncertain Interactive Traffic Scenarios via Graph Convolution-Based Deep Reinforcement Learning**

**Xin Gao 1, Xueyuan Li 1,\*, Qi Liu 1,\*, Zirui Li 1,2, Fan Yang <sup>1</sup> and Tian Luan <sup>1</sup>**


**Abstract:** As one of the main elements of reinforcement learning, the design of the reward function is often not given enough attention when reinforcement learning is used in concrete applications, which leads to unsatisfactory performances. In this study, a reward function matrix is proposed for training various decision-making modes with emphasis on decision-making styles and further emphasis on incentives and punishments. Additionally, we model a traffic scene via graph model to better represent the interaction between vehicles, and adopt the graph convolutional network (GCN) to extract the features of the graph structure to help the connected autonomous vehicles perform decision-making directly. Furthermore, we combine GCN with deep Q-learning and multistep double deep Q-learning to train four decision-making modes, which are named the graph convolutional deep Q-network (GQN) and the multi-step double graph convolutional deep Qnetwork (MDGQN). In the simulation, the superiority of the reward function matrix is proved by comparing it with the baseline, and evaluation metrics are proposed to verify the performance differences among decision-making modes. Results show that the trained decision-making modes can satisfy various driving requirements, including task completion rate, safety requirements, comfort level, and completion efficiency, by adjusting the weight values in the reward function matrix. Finally, the decision-making modes trained by MDGQN had better performance in an uncertain highway exit scene than those trained by GQN.

**Keywords:** multi-mode decision-making; connected autonomous vehicles; reward function matrix; uncertain highway exit scene; GQN; MDGQN

#### **1. Introduction**

Artificial intelligence is in its golden development age due to the exponential increase in data production and the continuous improvements in computing power [1]. Autonomous driving is one of the main uses. A comprehensive autonomous driving system integrates sensing, decision-making, and motion-controlling modules [2–4]. As the "brains" of connected autonomous vehicles (CAVs) [5], the decision-making module formulates the most reasonable control strategy according to the state feature matrix transmitted by the sensing module, the vehicle state, and the cloud transmission information [6]. Moreover, it sends the determined control strategy to the motion-controlling module, including high-level behavior and low-level control requirements [7,8]. It is crucial to complete autonomous driving tasks safely and efficiently by making reasonable decisions based on other modules [9].

In an uncertain interactive traffic scenario, the driving environment has rigorous dynamic characteristics and high uncertainty, and the influences of driving behaviors of different traffic participants will be transmitted continuously [10]. On the level of transportation overall, all

**Citation:** Gao, X.; Li, X.; Liu, Q.; Li, Z.; Yang, F.; Luan, T. Multi-Agent Decision-Making Modes in Uncertain Interactive Traffic Scenarios via Graph Convolution-Based Deep Reinforcement Learning. *Sensors* **2022**, *22*, 4586. https://doi.org/ 10.3390/s22124586

Academic Editors: Chao Huang, Yafei Wang, Peng Hang, Zhiqiang Zuo and Bo Leng

Received: 23 May 2022 Accepted: 15 June 2022 Published: 17 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

traffic participants need to cooperate efficiently [11]. At the traffic participant level, individuals need to judge the risk factors and make appropriate decisions sensitively based on dynamic scene changes [12]. In [13], a Gaussian mixture model and important weighted least squares probability classifier were combined and used for scene modeling. That model can identify the braking strength levels of new drivers under the condition of insufficient driving data. The key to participating in traffic scenarios for CAVs is that each CAV needs to generate appropriate and cooperative behavior to match human vehicles and other CAVs. Therefore, CAVs urgently demand efficient and accurate multi-agent decision-making technology to effectively handle the interactions between different traffic participants.

The current multi-agent decision-making technologies mainly focus on deep reinforcement learning (DRL) due to the excellent performance of DRL in high-dimensional dynamic state space [14]. The keys of DRL in uncertain interactive traffic scenarios can be summarized as follows: (1) Efficient modeling of interactive traffic scenes and accurate representation of state features. (2) Generating reasonable and cooperative decision-making behaviors based on uncertain scene changes and individual task requirements. The design of the reward function is an essential part of the DRL application. Concretizing and numericizing task objectives realizes the communication between objectives and algorithms. Therefore, the design of the reward function determines whether the agent can generate reasonable and cooperative decision-making behaviors [15]. In addition, the accurate modeling of the interactive traffic environment and representation of state characteristics are the requirements for agents to generate expected behaviors.

In studies of traffic scenarios using the DRL method, researchers found that in uncertain interactive traffic scenarios, sparse reward problems lead to agents having a lack of effective information guidance [16], making the algorithm difficult to converge. In order to solve the sparse reward problem, researchers divide the original goal into different sub-goals and give reasonable rewards or punishments. In [17], the DDPG was adopted to settle the autonomous braking problem. The reward function was split into three parts to solve the problems: braking too early, braking too late, and braking too quickly. In [18], the reward function was divided into efficiency and safety rewards to train the comprehensive safety and efficiency decision model. In addition, considering the changes in driving task requirements and scene complexity, some studies have given different weights to the sub-reward functions based on the decomposition of the reward function, so as to train different decision-making modes. In [19], the reward function was divided into sub-reward functions based on security, efficiency, and comfort. The system can realize different control targets by adjusting the weight values in the reward function.

In the decision-making research involving uncertain interactive traffic scenarios, the DRL method only takes the individual characteristics of each vehicle as the input. It ignores the interactive influence of transitivity between vehicles. This will result in CAVs not generating reasonable and cooperative behavior, which may reduce total traffic efficiency and the occurrence of traffic accidents. Graph representation can accurately describe the interactions between agents, representing the relationship between vehicles in uncertain interactive traffic scenarios. Therefore, some researchers focused on the graph reinforcement learning (GRL) method and modeled the interaction with graph representation [20]. In [21], a hierarchical GNN framework was proposed and combined with LSTM to model the interactions of heterogeneous traffic participants (vehicles, pedestrians, and riders) to predict their trajectories. The GRL method combines GNN and DRL: the features of interactive scenes are processed by GNN, and cooperative behaviors are generated by the DRL framework [22]. In [23], the traffic network was modeled by dynamic traffic flow probability graphs, and a graph convolutional policy network was used in reinforcement learning.

This paper proposes an innovative, dynamic reward function matrix, and various decision-making modes can be trained by adjusting the weight coefficient of the reward function matrix. Additionally, the weight coefficient is further set as a function of reward, forming an internal dynamic reward function. In the traffic environment adopted in this paper, the randomness and interactions between HVs and CAVs are strengthened by using some human vehicles making uncertain lane changes. Two GRL algorithms are used in this

paper, GQN and MDGQN. Finally, we report a simulation based on the SUMO platform and a comparative analysis from various perspectives, such as reward functions, algorithms, and decision-making modes. The schematic diagram of the designed framework is shown in Figure 1.

**Figure 1.** The schematic diagram of the designed framework. Letters N, A, and M represent node feature matrix, adjacency matrix, and CAV mask matrix respectively.

To summarize, the contributions of this paper are as follows:


ally, we also propose a set of indicators to evaluate the performance of driverless decision-making and used them to verify the performance differences among various algorithms and decision-making modes.

This article is organized as follows. Section 2 introduces the problem formulation. Section 3 introduces the methods used. Section 4 proposes the reward function matrix. Section 5 describes and analyzes the simulation results. Section 6 summarizes this paper and gives the future development directions.

#### **2. Problem Formulation**

#### *2.1. Scenario Construction*

As is shown in Figure 2, a 3-lane highway containing two exits was designed, and the three lanes were sorted from bottom to top as first, second, and third lanes. All the vehicles (HVs and CAVs) enter the road segment from the left of "Highway 0". The color of vehicle indicates its type and intention. In this paper, the vehicles were set as follows for safety principles and actual human driving rules:

**Figure 2.** The highway exit scene with solid interactions between CAVs and HVs.


The interaction between CAVs and HVs is enhanced through the above setting. The setting of HV1 enhances the uncertainty of the scene considerably, increasing the requirements for CAVs' decision-making. Straightening away from "Highway 2" is the most straightforward choice with the lowest collision risk, owing to CAVs' least lane change decisions. In addition, due to the uncertain lane change behavior of HV2, it is difficult for CAVs to explore "Exit 1" safely, and the collision risk of leaving the highway from "Exit 1" is higher than it is for other exits. Specific scenario setting parameters are shown in Table 1:


**Table 1.** Specific scenario setting parameters.

#### *2.2. State Representation*

The scenario is modeled as an undirected graph. Each vehicle in this scenario is regarded as the node of the graph, and the interaction between vehicles is considered the edge of the graph. Three matrices can represent the state space: a node features matrix *Xt*, an adjacency matrix *At*, and a CAV mask matrix *Mt*; each of them are described in the following.

Node Features Matrix: The vehicle's features are speed, position, location, and intention, denoted as [*Vi*,*Yi*, *Li*, *Ii*]. The node features matrix represents the features of each vehicle in the constructed scenario, which can be described as follows:

$$N\_t = \begin{bmatrix} \begin{bmatrix} \left[V\_1, Y\_1, L\_{1\prime}I\_1\right] \\ \left[V\_{2\prime}, Y\_{2\prime}, L\_{2\prime}I\_2\right] \\ \dots \\ \left[V\_{i\prime}, Y\_{i\prime}, L\_{i\prime}I\_i\right] \\ \dots \\ \left[V\_{n\prime}, Y\_{n\prime}, L\_{n\prime}I\_n\right] \end{bmatrix} \tag{1}$$

where *Vi* <sup>=</sup> *vi*−*actual*<sup>4</sup> *v*max denotes the ratio of actual longitudinal velocity to maximum longitudinal velocity. *Yi* <sup>=</sup> *yi*−*actual*<sup>5</sup> *Lhighway* denotes the percentage of actual longitudinal position of the total length of the highway. *Li* denotes the one-hot encoding matrix of the current lane of vehicles. *Ii* denotes the vehicle's one-hot encoding matrix of current intention (change to left lane, change to right lane, and go straight).

Adjacency matrix: In this paper, the interactions between vehicles are embodied in the information sharing between vehicles, expressed by the adjacency matrix. The calculation of the adjacency matrix is based on three assumptions:


The derivation of the adjacency matrix is as follows:

$$A\_{l} = \begin{bmatrix} a\_{11} & a\_{12} & \cdots & & \cdots & a\_{1n} \\ a\_{21} & a\_{22} & \cdots & & \cdots & a\_{2n} \\ \vdots & \vdots & \ddots & & & \vdots \\ & & & a\_{ij} \\ & & & & \ddots & \vdots \\ \vdots & \vdots & & & \ddots & \vdots \\ a\_{n1} & a\_{n2} & \cdots & & \cdots & a\_{nn} \end{bmatrix} \tag{2}$$

where *aij* denotes the edge value of the *i*th vehicle and the *j*th vehicle. *aij* = 1 denotes that the *i*th vehicle and the *j*th vehicle share information; *aij* = 0 denotes that the *i*th vehicle and the *j*th vehicle share no information.

CAV mask matrix: CAV mask is used to filter out the embeddings of HVs after the GCN fusion block. The specific mathematical expression is as follows:

$$M\_t = \begin{bmatrix} m\_1, m\_2, \dots, m\_i, \dots, m\_n \end{bmatrix} \tag{3}$$

where *mi* = 0 or 1. If the *i*th vehicle is controlled by the GRL algorithm, *mi* = 1; otherwise, *mi* = 0.

#### *2.3. Action Space*

At each time step, each CAV has a discrete action space representing potential actions to be executed at the next time step, as follows:

$$a\_i = \{a\_{lunc-clang\nu}, a\_{acceleration}\} \tag{4}$$

where *alane*−*change* indicates that the lane change action can be taken, including a left lane change, right lane change, or straight line; *aacceleration* denotes the discrete acceleration that CAV can take, and its value is equalized by interval [−8 m · <sup>s</sup>−2,5m · <sup>s</sup>−2] at 0.5 m · <sup>s</sup>−2.

#### **3. Methods**

This section describes the principles of the methods used, including graph convolutional neural networks, Q-learning, GQN, and MDGQN.

#### *3.1. Graph Convolutional Neural Network*

GCN is a neural network model that directly encodes graph structure. The goal is to learn a function of features on a graph. A graph can be represented by *G* = (*V*, *E*) in theory, where *V* denotes the set of nodes in the graph, the number of nodes in the graph is denoted by *N*, and *E* denotes the set of edges in the graph. The state of *G* is considered a tuple of 3 matrices of information: feature matrix *X*, adjacency matrix *A*, and degree matrix *D*.


GCN is a multi-layer graph convolution neural network, a first-order local approximation of spectral graph convolution. Each convolution layer only deals with first-order neighborhood information, and multi-order neighborhood information transmission can be realized by adding several convolution layers.

The propagation rules for each convolution layer are as follows [24]:

$$Z = \mathcal{g}(H, A) = \sigma(\hat{D}^{-1/2}\hat{A}\hat{D}^{-1/2}H^{(l)}\mathcal{W}^{(l)} + b) \tag{5}$$

where *A*ˆ = *A* + *IN* is the adjacency matrix of the undirected graph *G* with added selfconnections. *IN* is the identity matrix; *D*ˆ *ii* = ∑*<sup>j</sup> A*ˆ*ij* and *W*(*l*) are layer-specific trainable weight matrices. *σ*(·) denotes an activation function, such as the Re*LU*(·) = max(0, ·). *<sup>H</sup>*(*l*) <sup>∈</sup> *<sup>R</sup>N*×*<sup>D</sup>* is the matrix of activations in the *<sup>l</sup>*th layer, *<sup>H</sup>*(0) = *<sup>X</sup>*.

#### *3.2. Deep Q-Learning*

Q-learning [25] is a value-based reinforcement learning algorithm. Q*<sup>t</sup>* is the expectation that Q(*st*, *at*) can obtain benefits by taking action *at*(*at* ∈ *At*) under the state of *st*(*st* ∈ *St*) at time *t*. The environment will feed back the corresponding reward *Rt* according to the agent's action. Each time step produces a quadruplet (*st*, *at*,*rt*,*st*+1), and it is stored in the experience replay. Deep Q-learning [26] replaces the optimal action–value function with the deep neural network *Q*(*s*, *a*, *ω*). The following is a description of the principle of the algorithm.

The predicted values of DQN can be calculated according to the given four-tuple (*st*, *at*,*rt*,*st*+1):

$$\dot{q}\_t = \mathbb{Q}(s\_t, a\_t, \omega) \tag{6}$$

The TD target can be calculated based on the actual observation reward *rt*:

$$\mathcal{Y}\_t = r\_t + \gamma \cdot \max\_{a \in \mathcal{A}} Q(s\_{t+1}, a, \omega) \tag{7}$$

DQN updates network parameters according to the following formula:

$$
\omega \gets \omega - \mathfrak{a} \cdot (\mathfrak{q}\_{\mathfrak{t}} - \mathfrak{g}\_{\mathfrak{t}}) \cdot \nabla\_{\omega} Q(\mathfrak{s}\_{\mathfrak{t}}, a\_{\mathfrak{t}}, \omega) \tag{8}
$$

where *α* represents the learning rate.

#### *3.3. Graph Convolutional Q-Network (GQN)*

As described in Section 3.2, Q-learning uses a Q-Table to store the Q value of each state–action pair. However, if the state and action space are high-dimensionally continuous, there will be the curse of dimensionality; that is, with a linear increase in dimensions, the calculation load increases exponentially. GQN [27] replaces the optimal action–value function *Q*-(*s*, *a*) with the graph convolutional neural network *Q*(*s*, *a*, *θ*):

$$Q(s, a, \theta) = \rho(Z, a) \tag{9}$$

where *Z* is the node embeddings output from graph convolution layer. *ρ* represents the neural network block, including the fully connected layer. *θ* is the aggregation of all the weights.

The specific training process of GQN is the same as DQN [26]. Firstly, the predictive value of GQN can be calculated according to the four-tuple (*st*, *at*,*rt*,*st*+1) sampled from the experience replay:

$$\vec{q}\_t = \mathbb{Q}(s\_{t\prime}a\_{t\prime}\theta\_{nvw}) \tag{10}$$

However, since the Q-network uses the same estimate to update itself, it will cause bootstrapping and lead to deviation propagation. Therefore, another neural network can be used to calculate the TD target, called the target network *Q*(*s*, *a*, *θ*− *now*). Its network structure is precisely the same as that of the Q-network, but the parameter *θ*− *now* is different from *θnow*. Selecting the optimal action and forward propagation of the target network:

$$a^\* = \underset{a \in \mathcal{A}}{\text{arg}\max} \, Q(s\_{t+1}, a\_{t\prime} \theta^-\_{mw}) \tag{11}$$

$$\mathfrak{q}\_{t+1} = Q(s\_{t+1}, a^\*, \theta^-\_{now}) \tag{12}$$

Calculation of TD target *y*ˆ*<sup>t</sup>* and TD error *δt*:

$$
\mathfrak{F}\_t = r\_t + \gamma \cdot \mathfrak{q}\_{t+1} \tag{13}
$$

$$
\delta\_t = \vec{q}\_t - \hat{y}\_t \tag{14}
$$

The gradient ∇*θQ*(*st*, *at*, *θnow*) is calculated by the backpropagation of a Q-Network, and the parameter of the Q-Network is updated by gradient descent:

$$
\theta\_{\text{new}} \gets \theta\_{\text{now}} - \mathfrak{a} \cdot \delta\_{\text{t}} \cdot \nabla\_{\theta} \mathbb{Q}(\mathbf{s}\_{\text{t}}, a\_{\text{t}}, \theta\_{\text{now}}) \tag{15}
$$

where *θnew* is the parameter of the updated Q-Network. *α* represents the learning rate.

Finally, GQN adopts the weighted average of the two networks to update the target network parameters:

$$
\theta\_{new}^- \leftarrow \pi \cdot \theta\_{new} + (1 - \pi) \cdot \theta\_{now}^- \tag{16}
$$

where *τ* represents soft update rate.

#### *3.4. Multi-Step Double Graph Convolutional Q-Network (MDGQN)*

MDGQN further adopts double Q-learning and a multi-step TD target algorithm based on GQN. As shown in Section 3.3, the target network cannot completely avoid bootstrapping, since the parameters of the target network are still related to the Q-Network. The double Q-learning algorithm is improved based on the target network. The Q-learning with the target network uses the target network to select the optimal action and calculate the Q value of the optimal action. However, the double Q-learning selects the optimal action according to the Q-Network and uses the target network to calculate the Q value of the optimal action. Equation (11) is modified as follows:

$$a^\star = \underset{a \in \mathcal{A}}{\text{arg}\, \text{max}} \, Q(s\_{t+1}, a\_{t\prime} \theta\_{\text{row}}) \tag{17}$$

The multi-step TD target algorithm can balance the significant variance caused by Monte Carlo and the significant deviation caused by bootstrapping. Equation (13) is modified as follows:

$$\mathcal{Y}\_t = \sum\_{i=0}^{m-1} \gamma^i r\_{t+i} + \gamma^m \cdot \vec{q}\_{t+m} \tag{18}$$

where *y*ˆ*<sup>t</sup>* is called the m-step TD target.

#### **4. Reward Functions**

The design of the reward function is an important criterion and goal of the DRL training process. Based on four aspects—the results the of the driving task, traffic efficiency, ride comfort, and safety—the reward function is divided into four blocks. Further, we propose a reward function matrix including a decision-weighted coefficient matrix, an incentive-punished-weighted coefficient matrix, and a reward–penalty function matrix.

$$\begin{aligned} \mathbf{r} &= \text{tr}\left(\mathbf{A}\_{k1}\mathbf{A}\_{k2}\mathbf{A}\_{r}\right) \\ &= \text{tr}\left[\begin{bmatrix} \mathbf{k}\_{r} \\ & \mathbf{k}\_{c} \\ & & \mathbf{k}\_{c} \\ & & & \mathbf{k}\_{s} \end{bmatrix} \begin{bmatrix} \mathbf{k}\_{r1} \\ \mathbf{k}\_{rP} \\ & \mathbf{k}\_{cI} \\ & & \mathbf{k}\_{cP} \\ & & & \mathbf{k}\_{cP} \\ & & & & \mathbf{k}\_{sI} \end{bmatrix}^{T} \begin{bmatrix} r\_{\text{resall}-I} \\ r\_{\text{resall}-P} \\ r\_{\text{resall}-P} \\ & r\_{\text{eff}i\,\text{inv}p-P} \\ & & r\_{\text{conf}f\,\text{tot}-I} \\ & & & r\_{\text{conf}f\,\text{tot}-P} \\ & & & & r\_{\text{conf}f-I} \end{bmatrix} \begin{bmatrix} \mathbf{r} \\ \mathbf{r} \\ \mathbf{r}\_{\text{diff}} \\ \mathbf{r}\_{\text{surface}} \\ & r\_{\text{surface}} \end{bmatrix} \end{aligned} \tag{19}$$

Specific parameters are described below:


By adjusting the weight coefficient of the decision-weighted coefficient matrix and incentive-punished-weighted coefficient matrix, DRL can train different goal-oriented decision-making modes. In the decision-making process of autonomous vehicles, the upper control module can choose different decision-making modes according to different needs. In order to select a decision-making model with excellent comprehensive performance and strong contrast, we conducted multiple sets of experiments, and some experimental data were put in Appendix A. This paper determined four decision modes: AGGI, AGGP, CONI, and CONP, by adjusting the parameters in Tables 2 and 3.


**Table 2.** Weight coefficient of the decision-weighted coefficient matrix.

**Table 3.** Weight coefficient of the incentive-punished-weighted coefficient matrix.


Since the change of weight coefficient will dilute some essential rewards or punishments, this paper improves the reward function against this defect. The weight coefficient of the incentive-punished-weighted coefficient matrix is further set as a functional of reward functions, which forms an internal dynamic reward function. The specific formula is as follows: 

$$\begin{cases} \begin{aligned} \boldsymbol{k}\_{rI} &= \boldsymbol{k}\_{rI0} \cdot \exp\left( \left( r\_{\text{result}-P} + r\_{\text{confort}-P} \right) / 100,000 \right) \\\ \boldsymbol{k}\_{cI} &= \boldsymbol{k}\_{cI0} \cdot \exp\left( \left( r\_{\text{result}-P} + r\_{\text{confort}-P} \right) / 80,000 \right) \\\ \boldsymbol{k}\_{sI} &= \boldsymbol{k}\_{sI0} \cdot \exp\left( \left( r\_{\text{result}-P} + r\_{\text{confort}-P} \right) / 150,000 \right) \end{aligned} \end{cases} \tag{20}$$

Based on [3], the specific reward functions and penalty functions were designed. Firstly, we designed the corresponding reward function and penalty function based on the results of the driving tasks. The independent variable of the reward function is the number of CAVs and HV2 reaching destinations, which aims to train decisions that can assist HVs in completing driving tasks. The penalty function is designed based on collisions.

$$r\_{result-I} = 300(n\_{r1} + n\_{r2})\tag{21}$$

$$r\_{result-P} = -60\,000\,\text{,} if\ collision \tag{22}$$

where *nr*<sup>1</sup> is the number of CAVs leaving from "Exit 1". *nr*<sup>2</sup> is the number of CAVs leaving from "Exit 1".

To train the decision-making model to improve traffic efficiency, this paper divides the speed interval of CAVs into three parts. The corresponding reward and penalty functions were designed to curb speeding, encourage high-speed driving, and punish low-speed blocking for these three-speed ranges. In order to make CAVs faster and more stable to explore the optimal speed, we used the exponential function to design a soft reward function [28].

$$r\_{efficancy-I} = \exp\left[\frac{\theta \times (v\_y - v\_{y\min})}{v\_{y\max}}\right], \text{if } v\_{y\min} \le v\_y \le v\_{y\max} \tag{23}$$

$$r\_{efficancy-P} = \begin{cases} -\exp(v\_y - v\_{y\max}), if \ v\_y > v\_{y\max} \\ -\exp\left(1 + \frac{v\_{y\min} - v\_y}{3}\right), if \ v\_y < v\_{y\min} \end{cases} \tag{24}$$

where *vy* is the velocity of the CAV. *vy* max represents the maximum velocity allowed by the current lane; its value is 25 m · <sup>s</sup>−1, or 15 m · <sup>s</sup>−1.

In order to improve the ride comfort of all vehicles in this traffic section, the corresponding reward function and penalty function are designed based on the acceleration and lane change times of all vehicles.

$$r\_{comfort-I} = \exp(\mathfrak{z}) \times n\_{\mathfrak{z}1} \tag{25}$$

$$r\_{confart-P} = -2000 \times n\_{c2} - \exp\left(\frac{m}{2}\right) \tag{26}$$

where *nc*<sup>1</sup> is the number of vehicles with acceleration of [−2 m · <sup>s</sup>−2,2 m · <sup>s</sup>−2]. *nc*<sup>2</sup> is the number of vehicles with acceleration of (−∞, <sup>−</sup>4.5 m · <sup>s</sup>−2]. *<sup>m</sup>* is the number of lane changes in this traffic section within 0.5 s.

Superior security performance is the premise of developing decision technology [29]. In [30], the length of CAVs' safety time is one of the most important factors affecting road safety. This paper introduces the safety time of the CAVs into the corresponding reward function. The definition of the safety time is as follows:

$$t\_1 = t\_{safe-follower} = \frac{y\_{AV} - y\_{follower}}{v\_{follower} - v\_{AV}}\tag{27}$$

$$t\_2 = t\_{safe-lender} = \frac{y\_{lender} - y\_{AV}}{v\_{AV} - v\_{lender}} \tag{28}$$

where *yAV* is the longitudinal position of the CAV. *yleader* and *yf ollower* are the longitudinal positions of the front and rear vehicles of CAV, respectively. *vleader* and *vf ollower* are the longitudinal speeds of the front and rear vehicles of CAV, respectively.

A driving hazard diagram is proposed to represent the degree of danger of the vehicle's state based on safety time *tsaf e*−*f ollower* and *tsaf e*−*leader*. As shown in Figure 3, three primary colors are used to represent the degrees of danger in this state. The red region represents collision accident danger, and the deeper the color, the greater the likelihood. The yellow area indicates that the vehicle needs to pay attention to the occurrence of a possible emergency. The green area indicates that the vehicle is in a safe state. By dividing Figure 3 into five categories, the sub-reward functions were designed. On the basis of the above principles, a security-based reward function is proposed for training security decisions. The formula is shown in Table 4.

**Figure 3.** The driving hazard diagram.

**Table 4.** The security-based reward function.


#### **5. Experiments**

#### *5.1. Parameter Setting*

In this section, we show the solid interaction scene designed through SUMO. Firstly, based on this scene, the proposed reward function is compared with the traditional reward function. The two algorithms were used to train four decision-making modes, and the differences between different algorithms and modes are compared. Finally, the performances of four decision-making modes based on the two algorithms are evaluated. The main parameters for algorithms are listed in Table 5.

**Table 5.** Parameters of the graph reinforcement learning.


#### *5.2. Evaluation Indexes*

In order to further evaluate the test performance of each decision-making mode, four kinds of evaluation indexes are proposed, namely, efficiency index, safety index, comfort index, and task index.

1. Efficiency: The average longitudinal velocity of CAVs and all vehicles in this scenario is proposed and used to evaluate the traffic efficiency of CAVs and the comprehensive efficiency of total traffic flow under different decision modes. Their functions are defined as follows:

$$\psi\_{AVs} = \frac{\sum\_{i=1}^{n} \sum\_{j=1}^{m} \mathcal{D}\_{AV} - ij}{N} \tag{29}$$

$$\text{dts.} = \frac{\sum\_{i=1}^{n} \sum\_{j=1}^{m} \upsilon\_{ij}}{N\_{all}} \tag{30}$$

where *vAV*−*ij* represents the longitudinal velocity of the *i*th CAV detected at the *j*th time step. *vij* is the longitudinal velocity of the *i*th vehicle detected at the *j*th time step. *N* indicates the total number of CAVs' longitudinal velocities detected in all-time steps. *Nall* indicates the total number of all vehicles' longitudinal velocities detected in all-time steps.

2. Safety: Due to the presence of multiple CAVs in the test scenario, we define the collision rate as the probability of each CAV collision accident in each episode.

$$p\_{collision} = \frac{N\_{collision}}{N\_{CAV}}\tag{31}$$

where *Ncollision* is the number of collisions in a single episode. *NCAV* represents the number of CAVs.

3. Comfort: For the evaluation of comfort, we mainly studied the deceleration of all vehicles under different decision-making modes. All vehicles' total emergency braking times *Nbraking* in a single episode are defined as comfort evaluation indexes. It should be noted that the deceleration between (−∞, <sup>−</sup>4.5 m · <sup>s</sup>−2] is regarded as emergency braking.

4. Task: Based on the solid interaction between CAVs and HVs in the test scenario, the driving task completion rates of CAVs and HV2 are used as the task indexes. Their functions are defined as follows.

$$p\_{CAV} = \frac{N\_{CAV-Exit1}}{N\_{CAV}}\tag{32}$$

$$p\_{HV2} = \frac{N\_{HV2-Exit1}}{N\_{HV2}}\tag{33}$$

where *NCAV*−*Exit*<sup>1</sup> is the number of CAVs that left via "Exit 1". *NHV*<sup>2</sup>−*Exit*<sup>1</sup> indicates the number of HV2 that left via "Exit 1". *NHV*<sup>2</sup> represents the number of HV2.

#### *5.3. Validation of the Reward Function*

The proposed reward function was used in GQN and MDGQN to train different decision-making modes while improving training efficiency. In [27], the traditional reward function only considers intention reward, speed reward, lane-changing penalty, and collision penalty. In this article, this traditional reward function is used as the baseline, and the reward of the CONP decision-making mode training process is compared. In order to allow them to make a fair comparison based on the maximum and minimum reward values they can achieve, the reward values in the training process are normalized.

As shown in Figure 4, using the proposed reward function for training can explore the maximum reward rapidly. The traditional reward function's training process experiences repeated reward reductions, and the maximum reward cannot be explored in the specified number of rounds. In addition, the training stability is greatly improved by comparing the reward fluctuation in the training process with the baseline. After the 200th episode, the average normalized reward values of Baseline-GCQ, CONP-GCQ, and CONP-MDGCQ were 0.7399, 0.9882, and 0.9886, respectively; and their variances were 0.0871, 0.0028, and 0.0049. Under the condition of using the GQN algorithm, the CONP decision-making model was 35.40% better than the average of baseline, and the variance is only 3.27% higher than that of the baseline. In summary, the proposed reward function can promote the fast convergence of the algorithm and greatly enhance the stability of the training process.

**Figure 4.** The comparison between the proposed reward function and baseline.

#### *5.4. Training of Different Decision-Making Modes*

As shown in Figure 5, under the four decision-making modes, the MDGQN algorithm converged faster than GQN, and the fluctuation of reward decline decreased significantly. This verified that MDGQN effectively alleviates the overestimation problem after using the multi-step TD target and double Q-learning algorithm to explore the optimal action more quickly. Based on MDGQN algorithm analysis, the AGGI decision-making mode converged in the 80th episode. In contrast, the AGGP decision-making mode did not fully

converge until the 160th episode, which further verifies that the reward and punishment ratio of the reward function affects the convergence speed of the algorithm. In summary, by comparing the four decision-making modes, the aggressive decision-making mode uses more punishment than the conservative decision-making mode. The incentive decisionmaking mode can promote the convergence of the algorithm faster than the punished decision-making mode.

**Figure 5.** The training reward diagrams of four modes are compared and verified by the GQN and MDGQN algorithms. (**a**) The AGGP decision-making mode, (**b**) the CONP decision-making mode, (**c**) the AGGI decision-making mode, and (**d**) the CONI decision-making mode.

Furthermore, the training decision-making mode was tested, and the reward mean and variance are compared. As shown in Figure 6, MDGQN explored higher reward thresholds and averages than GQN in all four decision modes. Under AGGP, AGGI, CONP, and CONI decision-making modes, the corresponding test reward variances of MDGQN were 93.91, 80.26, 64.71, and 87.10; these were less than 108.51, 108.19, 93.78, and 97.01 for GQN. It can be concluded that the test stability of the CONP decision mode is the strongest, but the ability to distinguish algorithm differences is poor. AGGI is the decision-making mode that can best reflect the differences between algorithms, but the test stability decreases slightly.

**Figure 6.** Test reward diagram of four decision-making modes. Orange represents the decisionmaking mode using the GQN algorithm, and green indicates the decision-making mode using the MDGQN algorithm. In the figure, the two ends of the error rod represent the maximum and minimum values of the test values.

#### *5.5. Evaluation of Decision-Making Modes*

Based on Figure 7, the two algorithms are first compared. In the four modes, the average speed of CAVs was better than that of GQN except for the CONI decision mode. MDGQN performed better than GQN in the speed of total traffic flow under the four modes. However, the proportion of CAVs' average speed increase in the total traffic flow's average speed increase is less than 1. In the CONI decision-making mode, the average speed of CAVs trained by MDGQN was lower than that of GQN, but the total traffic flow speed increased. This shows that based on the designed reward function, the MDGQN algorithm can obtain decisions that improve the total traffic efficiency by training CAVs. As shown in Table 6, MDGQN had a lower collision rate, fewer emergency braking times, and a higher arrival rate of CAVs and HV2 than GQN under the same decision-making mode, except for a slight increase in the number of emergency braking incidents under CONP decision-making mode. This can be analyzed from the corresponding data. Under the CONP decision-making mode, the *pHV*<sup>2</sup> of MDGQN was 0.717, which was 18.12% higher than that of GQN. When the HV2 increases in "Exit 1", the uncertainty of the scene will enhance, increasing the number of emergency brake events. From the above results, it can be concluded that the comprehensive performance of the MDGQN algorithm is superior to that of GQN in this specific autonomous driving scene, and the MDGQN is more suitable for solid, interactive, uncertain scenes.

Additionally, the four decision-making modes are compared. Compared with the conservative decision-making mode, the aggressive decision-making mode had a higher average speed, a higher collision rate, more emergency braking incidents, and a higher vehicle arrival rate. Compared with the punished decision-making model, the average speed of the incentive decision-making mode was slightly higher, the collision rate was higher, the number of emergency braking incidents was higher, and the arrival rate of HV2 was further improved. Among the four decision-making modes, AGGI decision-making mode had the highest traffic efficiency; CAVs and HV2 had the highest arrival rate, but their collision rate and number of emergency braking incidents were are also the highest. The collision rate and emergency braking incidents of CONP decision mode were the lowest, but the traffic efficiency was the lowest, and the arrival rates of CAVs and HV2 were the lowest. Those performance differences are consistent with the weight coefficient

allocation of the proposed reward function matrix. This fully proves that the proposed reward function matrix can effectively train various decision-making modes to adapt to autonomous driving scenarios.

**Figure 7.** Average longitudinal velocity of CAVs and total traffic flow under four decision-making modes. The CAV-GQN means the average longitudinal speed of CAVs under the training of the GQN algorithm. The CAV-MDGQN represents the average longitudinal velocity of CAVs under the MDGQN algorithm training. The V-GQN indicates the average longitudinal speed of the total traffic flow under the training of the GQN algorithm. The V-MDGQN indicates the average longitudinal velocity of total traffic flow under MDGQN training. The two ends of the error rod indicate the maximum and minimum values of the test values.


**Table 6.** Test evaluation results.

#### **6. Conclusions**

We proposed a reward function matrix, including a decision-weighted coefficient matrix, an incentive-punished-weighted coefficient matrix, and a reward–penalty function matrix. By adjusting the weight coefficient of the reward function matrix, various decisionmaking modes can be trained. We trained four decision-making modes, namely, AGGI, AGGP, CONI, and CONP; and the GQN algorithm and the MDGQN algorithm based on GQN improvement were used for verification by comparison. A large number of simulation results proved the following three conclusions. Firstly, the proposed reward function can promote the fast convergence of the algorithm and greatly improve the stability of the training process. Taking the CONP decision-making mode as an example, the average normalized reward value after the 200th round was 35.40% higher than that of the baseline, and the variance was only 3.27% greater than that of the baseline. Secondly, the comprehensive performance of the MDGQN algorithm is superior to that of GQN. Under the four decision-making modes, the averages and variances of test reward values of MDGQN are better than those of GQN. In terms of driving performance, MDGQN performs better than GQN, except that the number of brakes increases slightly in CONP decision-making mode. Finally, the proposed reward function matrix can effectively train various decision-making modes to adapt to different autonomous driving scenarios. With an increase in incentive weight, the comparison effect of the algorithm is more obvious, but the security will decrease. In our future work, we will further study the interactions of autonomous vehicles and decision mode switching.

**Author Contributions:** Conceptualization , X.G. and Q.L.; methodology, X.G.; software, Q.L. and Z.L.; validation, X.G., Q.L. and X.L.; formal analysis, X.G.; investigation, F.Y.; resources, T.L.; data curation, X.G.; writing—original draft preparation, X.G.; writing—review and editing, Q.L., X.L. and Z.L.; visualization, X.G.; supervision, X.L.; project administration, X.L.; funding acquisition, X.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

In the training process of decision-making mode, we adjusted different weight coefficients to find a mode with high comprehensive performance. To better validate the effectiveness of the proposed reward function matrix at training different decision patterns, we list four other sets of parameter settings, as shown in the Table A1.



The simulation results are shown in Table A2. From this we can conclude that too much of an increase in the proportion of safety will lead to a decline in the completion rate of the driving tasks, especially CAVs' loss of the driving task objectives. Excessive increases in the proportions of the driving tasks and efficiency can lead to a rapid decline in safety. An increase in incentive rewards will lead to a decrease in security, and higher task completion rate and efficiency. Based on the above results, four groups of reasonable parameters were selected, and four decision models, CONP, CONI, AGGP, and AGGI, were trained.


#### **References**


## *Article* **Local Path Planning of Autonomous Vehicle Based on an Improved Heuristic Bi-RRT Algorithm in Dynamic Obstacle Avoidance Environment**

**Xiao Zhang 1, Tong Zhu 2,\*, Lei Du 1, Yueqi Hu <sup>3</sup> and Haoxue Liu <sup>1</sup>**

<sup>2</sup> College of Transportation Engineering, Chang'an University, Xi'an 710064, China

<sup>3</sup> School of Vehicle Engineering, Xi'an Aeronautical Institute, Xi'an 710077, China

**\*** Correspondence: zhutong@chd.edu.cn; Tel.: +86-2982334729

**Abstract:** The existing variants of the rapidly exploring random tree (RRT) cannot be effectively applied in local path planning of the autonomous vehicle and solve the coherence problem of paths between the front and back frames. Thus, an improved heuristic Bi-RRT algorithm is proposed, which is suitable for obstacle avoidance of the vehicle in an unknown dynamic environment. The vehicle constraint considering the driver's driving habit and the obstacle-free direct connection mode of two random trees are introduced. Multi-sampling biased towards the target state reduces invalid searches, and parent node selection with the comprehensive measurement index accelerates the algorithm's execution while making the initial path gentle. The adaptive greedy step size, introducing the target direction, expands the node more effectively. Moreover, path reorganization minimizes redundant path points and makes the path's curvature continuous, and path coherence makes paths between the frames connect smoothly. Simulation analysis clarifies the efficient performance of the proposed algorithm, which can generate the smoothest path within the shortest time compared with the other four algorithms. Furthermore, the experiments on dynamic environments further show that the proposed algorithm can generate a differentiable coherence path, ensuring the ride comfort and stability of the vehicle.

**Keywords:** autonomous vehicle; local path planning; Bi-RRT; path reorganization; path coherence

#### **1. Introduction**

The intelligent transportation system is a real-time, accurate, efficient, and comprehensive transportation management system that plays a role in various directions [1]. It can effectively improve road capacity, reduce traffic accidents, improve transportation efficiency, and alleviate traffic congestion [2,3]. Meanwhile, it can also reduce energy consumption and improve environmental pollution [4,5]. Therefore, it has become the future development direction of the transportation system and has attracted more and more attention from all countries. As a component, the autonomous vehicle plays an essential role in the intelligent transportation system. It consists of an environmental perception layer, a path planning layer, and a path tracking control layer, and the study of path planning has always been a core problem. Commonly, path planning refers to efficiently finding a collision-free and feasible path from a starting point to a target point in a workspace [6–8]. In practical usage, the quality of the planned path will directly affect the vehicle's driving performance, so how to plan a passable path that can be tracked is very important for autonomous vehicles.

Scholars have carried out much research on path planning, and new path-planning algorithms are constantly emerging and developing. In the previous studies, five common categories of path planning algorithms can be found: geometric algorithms [9,10], graph search algorithms [11,12], intelligent bionic algorithms [13,14], the artificial potential field algorithm [15], and sampling-based search algorithms. Sampling-based search algorithms,

**Citation:** Zhang, X.; Zhu, T.; Du, L.; Hu, Y.; Liu, H. Local Path Planning of Autonomous Vehicle Based on an Improved Heuristic Bi-RRT Algorithm in Dynamic Obstacle Avoidance Environment. *Sensors* **2022**, *22*, 7968. https://doi.org/ 10.3390/s22207968

Academic Editor: Chao Huang

Received: 20 September 2022 Accepted: 17 October 2022 Published: 19 October 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

<sup>1</sup> School of Automobile, Chang'an University, Xi'an 710064, China

including the rapidly exploring random tree (RRT) and the probability roadmap, have effectively solved many challenging planning problems, especially in complex environments [16,17]. In path planning, the basic RRT algorithm is widely used for actuators with nonholonomic constraints because of the advantages, including probability completeness, low computational cost, and no need to model search space [18–20]. However, the basic RRT only focuses on finding a path, with less regard to the convergence speed, the search efficiency, and the path optimality [21–23]. To overcome the basic RRT's shortcomings, some scholars made many improvements. The biased RRT uses target-biased search to form an extended mode of nonrandom sampling, thus improving planning efficiency [24]. The bidirectional RRT (Bi-RRT) can simultaneously generate two trees from the starting point and the target point to explore the search space, improving the algorithm's search efficiency [25]. The RRT-connect, a Bi-RRT version fusing a greedy function, generates two trees from the starting point and the target point, respectively, which reduces the search space and accelerates the convergence speed of the algorithm [26]. The RRT\* uses new steps, including reselecting the parent node and rewiring the neighboring nodes of the newly inserted node to change search mode, thus generating a path with the optimal or approximate optimal length [27]. These algorithms improve the performance in planning speed and path length, respectively. However, they do not take the steering constraints of the wheels into account, resulting in them not being applied to the path planning of the autonomous vehicle directly.

When local practical environments are partially known or dynamic, supposing that some unknown or dynamic obstacles occupy the pre-generated global path at a certain moment, the autonomous vehicle will collide with the obstacles while tracking the path. Therefore, to avoid dynamic obstacles, the autonomous vehicle needs to regenerate a feasible path in real-time according to the environmental information obtained from its perception module. Park et al. constructed an algorithm combining the A\* method and the artificial potential field method to solve online local path planning problems in the campus environment, guaranteeing real-time performance and the shortest path generation [28]. Chen et al. use a two-layered path planning model structure consisting of the modified Bi-RRT based on the steering constraint and a vector field histogram-guided polynomial planning method to plan a safe and smooth path meeting the real-time requirement [29]. Ge et al. utilized the resultant force of the potential field, the separating axis theorem, and the cubic B-spline to improve the Bi-RRT\* and take the vehicle constraints into account, resulting in obtaining the smoothest path by taking the shortest time in practice the complicated environment [30]. Qi et al. utilized a modified RRT\* to obtain an initial path, regard the state tree structure as prior knowledge, and design an approach to choose the best node among several candidates to regenerate the path quickly, resulting in planning a path avoiding dynamic obstacles [31]. Zou et al. proposed a path-planning algorithm based on RRT and reinforcement learning optimization, which can generate a smooth and steady path in complex and unknown environments without collision with obstacles [32]. Li et al. presented a real-time RRT-based path planning strategy consisting of a pre-processing RRT path planner and a real-time planner, which can modify the path rather than regenerate a path to avoid the unknown moving obstacle [33]. Peng et al. introduced a new way to choose candidate nodes, incremental step size, and the rapidly exploring random vines with a trajectory parameter space to form a kinematically constrained RRT-based path planning algorithm, which can find collision-free and kinematically feasible paths in various environments, such as dense environments and environments with narrow passages [34]. Wen et al. employed environmental knowledge to guide the planning procedure of the optimal RRT\* method to propose a heuristic dual sampling domain reduction-based optimal RRT scheme including a layered online path planning framework in accordance with the model predictive control method, which outperforms traditional reduction schemes in terms of improving the execution efficiency of RRT\* and is more reliable [35]. Niu et al. proposed a global dynamic path planning method based on an improved A\* algorithm and combine it with the dynamic window method to improve the real-time performance

of the dynamic obstacle avoidance of the intelligent vehicle [36]. Wu et al. utilized the genetic algorithm to optimize the path length and turning angle to obtain a short and smooth path [37]. The above path planning algorithms have improved the length and smoothness of the path and can be applied in a dynamic environment. However, they seldomly consider the curvature consistency of paths in multiple frames which refers to the fact that there is no sudden change between the path planned in the current frame and the path planned in the previous frame.

Furthermore, path optimization can effectively reduce the control difficulty of autonomous vehicles with nonholonomic constraints. Ge et al. used the cubic B-spline directly to optimize the path, resulting in the planned path with more turns [30]. Lu et al. utilized Dubins curves to generate a path, but the curvature of the generated path is discontinuous [38]. Yang et al. used path pruning to delete unnecessary path nodes without considering the included angles between line segments between path nodes, resulting in excessive curvature of the final planned path [39]. Chen et al. adopted path pruning based on inserted points to solve the initial path without considering the influence of inserted points on the path length, causing the length of the final planned path may not be optimal [29]. Thus, developing a path optimization method that can consider both path pruning and smoothing is necessary.

Therefore, in this article, an improved heuristic Bi-RRT path planning algorithm is proposed to solve local path planning problems of the dynamic environment by considering path length and the continuity of path curvature between frames. The improved heuristic Bi-RRT algorithm has contributed to node sampling, node selection, node extension, the interconnection mode of two trees, path organization, and the coherence of path curvature between frames.


The article is organized as follows: Section 2 discusses the Bi-RRT path planning algorithm, the simplified vehicle model, and the differences in path planning between front and back frames. The improved heuristic Bi-RRT algorithm is presented in Section 3. Section 4 presents simulation experiments to demonstrate the effectiveness and practicability of the proposed algorithm. Section 5 presents the discussion about its performance, and conclusions are provided thereafter.

#### **2. Problem Statements**

The basic bi-RRT algorithm is briefly described in this section, and its shortcomings are pointed out. A brief introduction of the vehicle model points out that the turning radius constraint of the vehicle needs to be considered in the path planning process. Furthermore, the influence of the difference between the path planning results of the front and rear frames in the dynamic path planning process on vehicle driving is described.

#### *2.1. Basic Bi-RRT*

The Bi-RRT is a variant of the basic RRT, which changes the expansion mode of the algorithm. That is, two random trees are constructed from the initial state and the target state, respectively. In each cycle, a random tree is first expanded to generate a new tree node, and then another random tree also starts to generate a new tree node, making two random trees expand towards each other. The two random trees expand alternately until the nodes of the two random trees meet. The searching schematic diagram of the basic Bi-RRT is shown in Figure 1.

**Figure 1.** Schematic diagram of basic Bi-RRT expansion.

Algorithm 1 shows the basic Bi-RRT algorithm. Once initialized, the basic Bi-RRT algorithm conducts its iterative circle by selecting a random point *Prand* from the configuration space using the sampling function *Random*\_*State* ( ) (Line 3). The algorithm then determines a near tree node *Pnear* by the function *Nearest*\_*Neighbor* ( ) and obtains a new tree node *Pnew* by the function *Extend* ( ) (Lines 4–5). If there are no obstacles between *Pnear* and *Pnew*, the new tree node *Pnew* is added to the random tree *Ta*, and the nearest tree node *Pnearest* from the random tree *Tb* is found by the function *Nearest*\_*Neighbor* ( ) (Lines 6–8). The iterative circle terminates if the distance between *Pnew* and *Pnearest* is less than *lthreshold* (Lines 9–11). Otherwise, *Ta* and *Tb* are swapped, and the procedures mentioned above are executed on the random tree *Tb* again (Line 12). Additionally, then, a path is generated by the function *Get*\_*Path T* ( ) (Line 16).


Algorithm 2 outlines the implementation procedure of the function *Get*\_*Path T* ( ). Once the basic Bi-RRT algorithm completes the construction of two random trees, *Ta* and *Tb*, two path point sets, *path*\_*a* and *path*\_*b*, are defined, and the last added tree nodes of the two random trees are put into two sets, *path*\_*a* and *path*\_*b*, respectively (Lines 1–3). Then, the two random trees are searched reversely according to indexes of parent nodes until their initial points are put into the path point sets, *path*\_*a* and *path*\_*b*, respectively (Lines 4–17). Finally, the path point set *path*\_*a* is reversed and then combined with the path point set *path*\_*b* to obtain a final path point set *path* (Lines 18–19).

**Algorithm 2:** Function *Get*\_*Path T* ( *Bi* − *T*)

```
1: Var path_a, path_b;
2: path_a.Add_Node (Ta.noden);
3: path_b.Add_Node (Tb.noden);
4: while 1 do
5: i ← Indexpre_node (Ta);
6: path_a.Back_Add_Node (Ta.nodei);
7: if i = 1 then
8: break
9: end if
10: end while
11. while 1 do
12: j ← Indexpre_node(Tb);
13: path_b.Back_Add_Node (Tb.nodej);
14: if j = 1 then
15: break
16: end if
17: end while
18: path_a ← reverse (path_a);
19: path ← path_a ∪ path_b ;
20: Return path
```
The basic Bi-RRT algorithm simultaneously generates two random trees from the starting and target points and expands them in opposite directions, accelerating the convergence speed of the algorithm. However, the expansion mode of tree nodes still lacks directivity, the connective mode of two trees can be further improved, and the generated path is difficult to be directly tracked by the vehicle.

#### *2.2. Vehicle Kinematical Model*

Since the Bi-RRT is an incremental path planning algorithm, the kinematic vehicle model can be used to limit the expansion process of tree nodes to ensure the feasibility of the path. That is, the nonholonomic constraint of the vehicle should be considered when increasing new tree nodes. Because the sophisticated kinematic model is seldom available, a simplified theoretical motion model is provided, as shown in Figure 2. Assuming that the vehicle does not slip laterally, and the rear wheels do not steer, the vehicle kinematic model is expressed by Equation (1). Furthermore, the steering radius of the vehicle can be expressed by Equation (2).

$$\begin{cases} \dot{\mathbf{x}} = \upsilon \cos \varphi \\ \dot{\mathbf{y}} = \upsilon \sin \varphi \\ \dot{\varphi} = \frac{\upsilon \tan \delta\_f}{L} \end{cases} \tag{1}$$

$$|k| \le k\_{\text{max}} = \frac{1}{L} \tan \delta\_{f\_{\text{max}}} = \frac{1}{R\_{\text{min}}} \tag{2}$$

where (*x*, *y*) represents the coordinate of the vehicle gravity center in the coordinate reference frame, *v* is the longitudinal speed of the vehicle, *ϕ* is the included angle between the vehicle main axis and the *X* axis, *δ<sup>f</sup>* is the steering angle of the front wheels and |*δ<sup>f</sup>* | ≤ *δf*max , *L* is the wheelbase of the vehicle, *k* and *R* are the turning curvature and steering radius of the vehicle, respectively.

**Figure 2.** Schematic of vehicle kinematic model.

In order to consider the feasibility of newly generated path segments, the minimum steering radius constraint should be taken into account during the node extension procedure of Bi-RRT.

#### *2.3. Path Planning Difference between Previous and Subsequent Frames*

Because the length and smoothness of the path are calculated based on the environmental information collected in a certain frame, only considering the length and smoothness of the path cannot guarantee the steady driving of the autonomous vehicle. Suppose the path planned in the current frame deviates too far from the previous one. In that case, the driving stability of the autonomous vehicle will decline and even collide with the obstacle vehicle. It can be seen from Figure 3 that the path of the previous frame of the autonomous vehicle is on the left side of the obstacle, whereas the path of the current frame is on the right side of the obstacle. Because of the inconsistency of the path of the previous and subsequent frames, the autonomous vehicle may not avoid the obstacle and has the risk of collision with the obstacle. The dotted arrow may represent the actual driving direction of the autonomous vehicle.

**Figure 3.** Path difference between current and previous frames.

Consequently, to prevent the difference in the paths generated by the previous and current frames of two adjacent planning cycles from influencing vehicle driving stability, it is necessary to consider the path information of the previous frame when planning the path in the current frame.

#### **3. Improved Heuristic Bi-RRT Algorithm**

Based on the above analysis, this section proposes an improved heuristic Bi-RRT algorithm for path planning in a dynamic obstacle avoidance environment. Figure 4 illustrates the model structure of the improved heuristic Bi-RRT. The input of the proposed model is a local driving environment and positioning information. The proposed algorithm model based on heuristic random sampling, heuristic nearest neighbor, heuristic extension, collision detection, direct connection detection, and path organization quickly generates a differentiable and collision-free path. Algorithm 3 shows the specific steps of the improved heuristic Bi-RRT algorithm.

**Figure 4.** Model structure of the improved heuristic Bi-RRT.

The improved heuristic Bi-RRT algorithm obtains an initial point by the function *Current*\_*Root* ( ) and initializes two random trees *Ta* and *Tb* in the same manner as they are in the basic Bi-RRT (Lines 1–2). Once the two random trees cannot be directly interconnected, the improved heuristic Bi-RRT begins its iterative processing by picking a random point in the feasible domain space through a sampling function *Heuristic*\_*Random*\_*Sampling* ( ) (Lines 4–5). Then, the parent node *Pnear* from tree *Ta* is found by the function *Heuristic*\_ *Nearest*\_*Neighbor* ( ), and the new tree node *Pnew* is generated by the function *Heuristic*\_ *Extend* ( ) (Lines 6–7). If there are no obstacles between *Pnear* and *Pnew*, the new tree node *Pnew* is added to the random tree *Ta*, and the nearest node *Pnearest* from tree *Tb* is found (Lines 8–11). Subsequently, the iterative processing ends if there are no obstacles between *Pnearest* and *Pnew*, namely *Pjudge*<sup>2</sup> and respectively (Lines 13–15). Otherwise, random trees *Tb* and *Tb* are swapped, and the procedures mentioned above are executed on the other random tree *Tb* again (Line 16). Afterward, an initial path is generated by the function

*Get*\_*Path T* ( ) (Line 20). Path organization, including path node reconnection and path smoothing, processes the initial path to obtain a feasible path (Lines 21–22).


1: *Pinit* ← *Current*\_*Root* ( ); 2: *Ta* (*Pinit*);*Tb* (*Pgoal*); 3: *Pjudge*<sup>1</sup> ← *Pinit* ; *Pjudge*<sup>2</sup> ← *Pgoal* ; 4: **while** *Obstacle*\_*Collision* (*Pjudge*<sup>1</sup> , *Pjudge*2) **do** 5: *Prand* ← *Heuristic*\_*Random*\_*Sampling* ( ); 6: *Pnear* ← *Heuristic*\_*Nearest*\_*Neighbor*(*Prand*, *Ta*); 7: *Pnew* ← *Heuristic*\_*Extend* ( *Pnear*, *Prand*); 8: **if** *Collision*\_*Free* ( *Pnear*, *Pnew*) **then** 9: *Ta*.*Add*(*Pnew*), *Ta*.*Add*(*Pnear*, *Pnew*) 10: *Pjudge*<sup>1</sup> ← *Pnew* ; 11: *Pnearest* ← *Nearest*\_*Neighbor* (*Pnew*, *Tb*); 12: *Pjudge*<sup>2</sup> ← *Pnearest* ; 13: **if** *Collision*\_*Free* (*Pjudge*1, *Pjudge*2) **then** 14: Re*turn T* (*Ta*, *Tb*) 15: *break* 16: **else** *Swap* (*Ta*, *Tb*) 17: **end if** 18: **end if** 19: **end while** 20: *Path* ← *Get*\_*Path T* (*Ta*, *Tb*); 21: *Path* ← *Heuristic*\_Re*connection* (*Path*); 22: *Trajectory* ← *Smoothing* (*Path*);

The improved heuristic Bi-RRT algorithm contains a set of heuristic methods to the benefit of path planning of the autonomous vehicle. The improvements in the connective mode of two random trees, node sampling, node selection, and node expansion based on the Bi-RRT framework are used to generate an initial path quickly. Additionally, then, path reorganization, including path reconnection and path smoothing, is employed to improve the quality of the initial path, making it suitable for tracking by the autonomous vehicle. The improved constraints containing the improved road environment and the improved vehicle constraint are conducive to generating a feasible path complying with the driver's driving habit. In addition, a novel path coherence method is introduced to make the generated path smoothly connected when planning a dynamic obstacle avoidance path. Specific methods of the algorithm are described as follows in detail.

#### *3.1. Connective Mode of Two Random Trees*

In order to further accelerate the running speed of the algorithm, the connective mode of two random trees can change from how the distance between two random trees is less than a certain distance threshold to the way of obstacle-free direct connection, as shown in Figure 5. After expanding the random tree *Ta* to obtain a new tree node *Pnew*−*a*, the nearest node *Pnearest* on the random tree *Tb* closest to the node *Pnew*−*<sup>a</sup>* is calculated, and whether the area between *Pnew*−*<sup>a</sup>* and *Pnearest* is passable is checked. If there are no obstacles between *Pnew*−*<sup>a</sup>* and *Pnearest*, *Pnew*−*<sup>a</sup>* and *Pnearest* are directly connected, and the initial path planning is complete. Otherwise, the algorithm continues to execute until the two random trees are connected successfully.

**Figure 5.** Obstacle-free direct connection of two random trees.

#### *3.2. Heuristic Target Bias Sampling Method*

The basic Bi-RRT usually adopts random searches in the global scope during the random sampling process, which will cause the generation of random points without guidance and too much unnecessary computation. As to this problem, a heuristic bias sampling strategy composed of a multiple-sampling method and a target bias method is adopted to make the random tree grow in a biased direction, enabling the initial state and the goal state to meet more effectively and faster. Equations (3) and (4) demonstrate how to calculate the random sampling state *Prand*.

$$P\_{random} = \textit{Heuristic\\_Random\\_State} \tag{3}$$

$$P\_{\rm rand} = \left\{ \begin{array}{l} P\_{\rm randm} \\\\ P\_{\rm randm} + \chi \cdot \left( \begin{array}{c} \frac{P\_{\rm randm} P\_{\rm target}}{P\_{\rm randm} P\_{\rm target}}\\ \hline \frac{P\_{\rm randm} P\_{\rm target}}{P\_{\rm randm} P\_{\rm target}} \end{array} \right) \right. \\\\ \left. \begin{array}{l} d\_{\rm obs} > d\_{\rm threshold} \\ \end{array} \right. \end{array} \tag{4}$$

where *Prandm* is the random sampling state generated by the multiple-sampling method, *Pt*arg*et* is the target point defined in each sampling process, and *χ* is the biased step size. *d*ob and *d*threshold are the distance from the random point *Prandm* to the obstacle and the distance threshold from the obstacle, respectively.

#### 3.2.1. Heuristic Random Sampling

With the knowledge of the initial and goal state, to make the sampling random point close to the target state, the *Multiple*\_*Random*\_*State* function generates several random points instead of one random point generated by the basic Bi-RRT algorithm in the free region using the *Random*\_*State* function. The multiple-sampling point function does not include the probability of bias to the target, hence avoiding the local minimum problem. Additionally, then, the introduction of the *Nearest*\_*To*\_*T*arg*et* function is used to select the random state closer to the goal from several candidate sampling points. The random point out of these candidate sampling points closest to the target becomes the chosen random state *Prandm*, causing the Bi-RRT to search toward the target point. The number of random states to be selected was set to two after simulation results showed the best performance when the number of random states was limited to two.

#### 3.2.2. Heuristic Target Bias

After obtaining the chosen random state *Prandm*, the target bias method is introduced. It can make the random point expand a step size *χ* along the direction from it to the target state to generate the random sampling point further closer to the target state, resulting in making the random tree grow more directionally and improving computational efficiency. When *d*ob is less than *d*threshold, if the generated random sampling point is still closer to the target state, it will make the newly generated tree node easy to collide with the obstacle, resulting in sampling failure and directly affecting the solution speed. Therefore, the target bias method introduces the distance threshold *d*threshold, which is the projection distance of

the semi-major axis of the expanded safety ellipse on the *x* axis of the coordinate reference frame. When *d*ob ≤ *d*threshold, that is, the chosen random state *Prandm* is close to the obstacle, *Prandm* is regarded as the generated random sampling point *Prand*, when *dob* > *d*threshold, that is, the chosen random state *Prandm* is far from the obstacle, *Prand* is generated from the target bias function. This strategy can maintain the balance between exploration and search speed.

Based on the basic Bi-RRT sampling function framework, the novel target bias sampling method changes the generation mode of the sampling point to make the generated sampling points more directional and effective, improving search efficiency and accelerating the algorithm's convergence. Algorithm 4 outlines the working of the function *Heuristic*\_*Random*\_*Sampling* ( ). The improved heuristic Bi-RRT obtains candidate sampling points through the sampling function *Random*\_*State* ( ) and selects the sampling point with the minimum distance cost as the current sampling point *Prandm* (Lines 1–3). Then, according to the distance between the current sampling point *Prandm* and the obstacle, the current sampling point *Prandm* is processed by the heuristic target bias method to obtain the final sampling point *Prand* (Lines 4–7). This procedure is performed in each sampling process.

**Algorithm 4:** Function *Heuristic*\_*Random*\_*Sampling* ( )

1: *Prand*<sup>1</sup> ← *Random*\_*State* ( ); 2: *Prand*<sup>2</sup> ← *Random*\_*State* ( ); 3: *Prandm* ← *Nearest*\_*To*\_*T*arg*et* (*Prand*1, *Prand*2); 4: **if** *dob* ≤ *dthreshold* **then** 5: *Prand* ← *Prandm* ; 6: **else** 7: *Prand* ← *Prandm* + *χ* · ⎛ ⎜⎜⎝ −−−−−−→ *PrandmPt*arg*et* ! ! ! ! ! −−−−−−→ *PrandmPt*arg*et* ! ! ! ! ! ⎞ ⎟⎟⎠ ; 8: **end if**

#### *3.3. Heuristic Parent Node Selection Method*

In the basic Bi-RRT algorithm, the nearest tree node is measured by the Euclidean distance from the random point to the tree node, which may generate a polyline path with sharp included angles between connecting line segments of path nodes. Even if the polyline line path is smoothed, it cannot be successfully followed by vehicles. Because the vehicle has a minimum steering radius in actual motion, the Euclidean distance should not be the only factor to consider during each node selection process. As shown in Figure 6, the current vehicle state is a solid rectangle containing a yellow vehicle with three alternative driving states shown as dashed rectangles. In order to find the nearest driving state for the current vehicle, if the Euclidean distance is only considered, there is no doubt that the first driving state is the closest with a Euclidean distance of zero and the second driving state takes second place. However, the first and second driving states, also named in situ steering and lateral translation, are not possible for the vehicle due to the minimum steering radius. Hence, the third driving state is more reasonable.

**Figure 6.** Different driving states.

Given the above analysis, the steering radius of the vehicle needs to be considered to choose the near tree node, namely the parent tree node. Therefore, a heuristic selection method of the parent tree node, namely a comprehensive measurement index considering the distance factor and angle factor, is introduced to make the generated path gentler and easily tracked by the vehicle. Furthermore, it can speed up the algorithm convergence and reduce the calculation time. Equations (5)–(11) show the calculation process when selecting the near tree node. The near tree node *Pnear* is determined as the tree node corresponding to the maximum comprehensive measurement index.

$$\mathbf{C}\_{M} = \omega\_{1} \cdot \mathbf{C}\_{dis\,\tan\,cc}^{1} + \omega\_{2} \cdot \mathbf{C}\_{angle}^{1} \tag{5}$$

$$\mathbf{C}\_{dis\,\tan\,cc} = \boldsymbol{\xi}\_1 \cdot \mathbf{C}\_{dis\,\tan\,cc1} + \boldsymbol{\xi}\_2 \cdot \mathbf{C}\_{dis\,\tan\,cc2} \tag{6}$$

$$\mathcal{C}\_{dis\,\tan c1} = \left| \begin{array}{c} \hline P\_{\rm trav} P\_{\rm rand} \\\\ \hline \end{array} \right| \tag{7}$$

$$\mathcal{C}\_{\text{dis\\_taxc2}} = \left| \begin{array}{c} \hline \hline P\_{\text{trac}} P\_{\text{goal}} \\\\ \hline \end{array} \right| \tag{8}$$

$$\mathbf{C}\_{\text{angle}} = \pi - \left( \cos^{-1} \left( \frac{\overrightarrow{P\_i P\_j} \cdot \overrightarrow{P\_i P\_{rand}}}{\left| \overrightarrow{P\_i P\_j} \right| \cdot \left| \overrightarrow{P\_i P\_{rand}} \right|} \right) \right) \tag{9}$$

$$\mathbf{C}\_{dis\,\tan\,cc}^{1} = \frac{\mathbf{C}\_{dis\,\tan\,c\,c\,\max} - \mathbf{C}\_{dis\,\tan\,cc}}{\mathbf{C}\_{dis\,\tan\,c\,\max}} \tag{10}$$

$$\mathbf{C}\_{\text{aug}\,\text{lc}}^{1} = \frac{\mathbf{C}\_{\text{aug}\,\text{lc}\,\text{max}} - \mathbf{C}\_{\text{aug}\,\text{lc}}}{\mathbf{C}\_{\text{aug}\,\text{lc}\,\text{c}\,\text{max}}} \tag{11}$$

where *ω*<sup>1</sup> and *ω*<sup>2</sup> are weighted coefficients of the distance index *C*<sup>1</sup> *dis*tan *ce* and the angle index *C*<sup>1</sup> *angle*, respectively. *<sup>C</sup>*<sup>1</sup> *dis*tan *ce* and *<sup>C</sup>*<sup>1</sup> *angle* are the normalized values of the distance *Cdis*tan *ce* and the angle *Cangle*, respectively. The diagrammatic presentation of the angle *Cangle* is shown in Figure 7. *Cdis*tan *ce* is the weighted sum of the distance *Cdis*tan *ce*<sup>1</sup> and the distance *Cdis*tan *ce*2, *ξ*<sup>1</sup> and *ξ*<sup>2</sup> are the weighted coefficients of *Cdis*tan *ce*<sup>1</sup> and *Cdis*tan *ce*2, respectively. *Cdis*tan *ce*<sup>1</sup> represents the distance from each tree node to the random sampling node, and *Cdis*tan *ce*<sup>2</sup> represents the distance from each tree node to the target state. The introduce of the distance *Cdis*tan *ce*<sup>2</sup> can make the selected near tree node have a trend close to the target state to increase the computational efficient.

**Figure 7.** Calculation of the included angle.

The transition from Euclidean distance to the comprehensive measurement index changes the growing characteristic of the random tree. The introduction of the comprehensive measurement index hinders the pure expansion of the random tree growth towards the configuration region. On the contrary, it drives the tree growth towards the direction associated with the target state and the gentle trend. As a result, it improves the running speed of the algorithm to some extent, and there is a natural trade-off between quick tree growth and better path quality.

#### *3.4. Heuristic Node Extension Method*

Expanding the near tree node to the random point usually adopts a fixed step size in the basic Bi-RRT. The large fixed step size tends to make the new node difficult to extend in the surrounding area of the obstacle, especially dense obstacle areas, resulting in extension failure and reducing the growth rate of the random tree. The small fixed step size could lead to a slow convergence speed of the algorithm. In other words, the fixed step size has the disadvantages of low flexibility and low security. In addition, the expansion direction of the tree node is along the vector direction from *Pnear* to *Prand*. When utilizing the large step size to extend the parent tree node along the expansion direction of the tree node deviating from the target state, an invalid and unnecessary node could be generated, resulting in the low quality of the generated path, and affecting the running speed of the algorithm. Thus, to solve these two problems, a heuristic node extension method, namely the adaptive greedy step size, is introduced. 'Adaptive' is reflected in the pattern that the step size is dynamically and gradually changing rather than fixed, and 'greedy' is reflected in the pattern that the greater the extent of the node expansion direction approaching the direction of the target state, the larger the step size. Equations (12)–(16) show the calculation of the adaptive greedy step size, and its simple legend is shown in Figure 8.

$$
\lambda\_{\text{adaptive\\_greedy}} = \begin{cases}
\begin{array}{cc}
\lambda & d\_{\text{near\\_ob}} < d\_{\text{threshold}} \\
\begin{array}{c}
(\eta\_{\text{cos}} + \sqrt{s}) \cdot \lambda \\
(1 - \eta\_{\text{cos}} + \sqrt{s}) \cdot \lambda \\
\end{array} \begin{array}{c}
\begin{array}{c}
d\_{\text{near\\_ob}} < d\_{\text{threshold}} \\
\end{array} \\
\begin{array}{c}
(1 - \eta\_{\text{cos}} + \sqrt{s}) \cdot \lambda \\
\end{array} \begin{array}{c}
\begin{array}{c}
(\eta\_{\text{new\\_ob}} > d\_{\text{threshold}} \\
\end{array} \\
\end{array}
\end{array}
\end{cases}
\tag{12}
$$

$$\eta\_{\rm cos} = \left| \begin{array}{c|c} \hline \hline V\_{\rm unit\\_r} \cdot \hline V\_{\rm unit\\_t} \\ \hline \hline V\_{\rm unit\\_r} \cdot \end{array} \right| \Bigg| \tag{13}$$

$$\beta = \operatorname{arc}\cos\left(\frac{\overrightarrow{V\_{\text{unit\\_}}} \cdot \overrightarrow{V\_{\text{unit\\_}}}}{\left|\overrightarrow{V\_{\text{unit\\_}}}\right| \cdot \left|\overrightarrow{V\_{\text{unit\\_}}}\right|}\right) \tag{14}$$

$$\overrightarrow{V\_{\text{unit\\_}}} = \frac{\overrightarrow{P\_{\text{near}}P\_{\text{target}}}}{\left| \overrightarrow{P\_{\text{near}}P\_{\text{target}}} \right|} \tag{15}$$

$$\overrightarrow{V\_{umit\_{-}r}} = \begin{array}{c|c} \overrightarrow{P\_{max}P\_{rand}} \\ \left| \overrightarrow{P\_{max}P\_{rand}} \right| \end{array} \tag{16}$$

where *<sup>λ</sup>* is the basic lower bound of the step size, −−−−→ *Vunit\_ t* and −−−−→ *Vunit\_ r* are unit vectors from *Pnear* to *Pt*arg*et* and from *Pnear* to *Prand*, respectively. *ϕ* is the included angle between −−−−→ *Vunit\_ t* and −−−−→ *Vunit\_ r* , *<sup>η</sup>*cos is the cotangent of *<sup>ϕ</sup>*, *<sup>s</sup>* is a constant as a regulating coefficient, and *dnear*\_*ob* is the distance from *Pnear* to the obstacle. Far away from the obstacle means *dnearest*\_*ob* ≥ *dthreshold*.

**Figure 8.** Adaptive greedy step size.

Compared with the traditional fixed step size, the adaptive greedy step size is no longer a constant. Its size is not only related to the distance from the obstacle but also

to the included angle of the vector −−−−−−→ *PnearPrand* towards the target state. In this way, when near an obstacle, a small step size can be used for the basic extension, making the search path more detailed and safer. When far from an obstacle, the step size can be greedily adjusted as the included angle changes, making the random tree adopt the larger step size to expand rapidly along the extension direction closer to the target state and reduce some blind expansions. As a result, the greedy mode of the step size accelerates the growth of the random tree and makes the growth of the path tend to approach the target state to a certain extent. It is exactly because of its dynamic adjustability that the adaptive greedy step size extension can often pass the obstacle detection when approaching the obstacle, unlike the fixed step size extension. Using the adaptive greedy step size can significantly improve the success rate of new node generation and accelerate overall search efficiency effectively.

#### *3.5. Improved Constraints*

Suppose a path can be successfully and effectively tracked by an autonomous vehicle. In that case, the path should not only meet the obstacle constraints but also comply with the driving characteristics of the actual driver, resulting in avoiding causing excessive tension and discomfort to the driver and passengers. The obstacle constraints include the road environment and the environmental constraint formed by the vehicle being regarded as an obstacle, namely, the vehicle constraint. Therefore, the road environment and the vehicle constraint are improved to obtain a feasible path meeting the driving habits of the actual driver.

3.5.1. Improved Road Environment

In order to generate a practical path, the path nodes need to meet the constraints of road environments when planning. The road environment restricts the planned path more effectively by considering the vehicle width to avoid collision between the vehicle and the road edge. Thus, the newly extended nodes need to meet the requirements of the following Equation (17), and the schematic diagram of the road environment is shown in Figure 9.

$$\begin{cases} \begin{array}{c} P\_{\textit{initial\\_x}} < T\_{\textit{mode\\_x}} < P\_{\textit{goal\\_x}} \\ B\_r + \frac{W\_h}{2} < T\_{\textit{node\\_y}} < B\_l - \frac{W\_h}{2} \end{array} \end{cases} \tag{17}$$

where *Pinitial*\_*<sup>x</sup>* and *Pgoal*\_*<sup>x</sup>* are the *x* coordinates of the initial point and the goal point in the reference coordinate frame, respectively. *Bl* and *Br* are the left and right boundaries of the road, and *Wh* is the width of the host vehicle.

**Figure 9.** The improved road environment constraint.

#### 3.5.2. Improved Vehicle Constraint

In order to express the constraint generated by the obstacle vehicle conveniently, a safety ellipse is introduced to envelop the obstacle vehicle. When considering the subjective comfort of the driver and passengers, a planned path needs to be able to avoid the obstacle vehicle in advance, that is, avoiding the generation of excessive path curvature when approaching the obstacle vehicle. In this case, the planned path can avoid causing discomfort to the driver and passengers and be easily tracked by the vehicle. Therefore, on the basis that half of the vehicle length is used as the semi-major axis of the safety ellipse, the advanced obstacle avoidance distance is added to the semi-major axis to ensure that the vehicle obstacle avoidance occurs at a long distance. Equation (18) expresses the condition that the newly expanded node needs to meet, and the schematic diagram of the vehicle constraint is shown in Figure 10.

$$\begin{cases} \frac{\left(\mathbf{x} - \mathbf{x}\_{\text{cylstacle}}\right)^2}{\left[s\_{f1} \cdot \left(d\_{\text{safe}} + \frac{L\_0}{2}\right)\right]^2} + \frac{\left(y - y\_{\text{cylstacle}}\right)^2}{\left(s\_{f2} \cdot \mathcal{W}\_o\right)^2} > 1\\\ d\_{\text{safe}} = \frac{\mathbf{v}^2}{2 \cdot \eta \cdot \mathcal{g}} \end{cases} \tag{18}$$

where (*x*, *y*) is the point on the connecting segment between the newly extended node *Pnew* and its parent node *Pnear*, (*xob*, *yob*) is the position of the obstacle vehicle, *Lo* and *Wo* are the length and width of the obstacle vehicle, respectively. *sf* <sup>1</sup> and *sf* <sup>2</sup> are the expansion coefficients of the semi-major axis and the semi-minor axis of the safety ellipse, respectively, *v* is the speed of the host vehicle, *η* is the friction coefficient, *g* is the acceleration of gravity, and *dsaf e* is the advanced obstacle avoidance distance.

**Figure 10.** The improved vehicle constraint.

To sum up, during the process of the specific node extension, the improved constraints, including the improved road environment and the improved vehicle constraint, can make the planned path meet the actual driving requirement of the vehicle, namely, the path feasibility and the characteristics of avoiding the obstacle in advance.

#### *3.6. Path Reorganization Method*

Because the basic Bi-RRT algorithm adopts random sampling, the obtained path, also named the initial path, often has poor quality, mainly reflecting in containing too many unnecessary turn points and the discontinuity of path curvature. When tracking the initial path, the autonomous vehicle has to stop and change its driving direction so that it cannot drive smoothly and has unnecessary mechanical wear. Thus, a path reorganization method is needed to optimize the initial path, that is, to remove unnecessary turn points and smooth the path.

After obtaining an initial path by the function *Get*\_*Path T* (*Bi* − *T*), as shown in Algorithm 2, the path reorganization method containing path node reconnection and path smoothing is used to process it. The path node reconnection is utilized to remove unnecessary turn nodes and insert path nodes to replace some necessary path nodes. As a result, it can make the included angles of the connecting line between path points meet the vehicle steering requirement, decreasing the control difficulty of the autonomous vehicle while reducing the path distance to the maximum extent. Additionally, then, the path obtained by the path node reconnection is a broken line composed of discrete path nodes; thus, it needs a further smoothing process to make the vehicle drive smoothly and steadily. Algorithm 5 describes the pseudocode of the path reorganization method.

#### 3.6.1. Path Node Reconnection

A path obtained by the function *Get*\_*Path T* (*Bi* − *T*) usually has poor connectivity due to the random attribute of the algorithm. Furthermore, there are many redundant turning segments in the path. As a result, it is often not continuously differentiable and infeasible. Thus, the path needs the path node reconnection to meet the prerequisite of path smoothing. Path node reconnection shown in lines 1–43 of Algorithm 5 is conducted to remove redundant path nodes and insert path nodes to replace some existing path nodes for obtaining a new path with maximum length reduction and no collision with obstacles. Moreover, it can ensure that the complementary angles of the included angles of line segments between path nodes are less than the steering angle of the front wheel. The function *Get*\_*Path T* (*Bi* − *T*) is used to obtain a path node set *S*<sup>0</sup> of the initial path from the forward node of the goal node to the root node of the initial node (Line 2). The first path node is defined as the root node *proot*, the second path node is defined as the parent node *pparent*, and the third path node is defined as the current node *pcurrent* (Line 3). A line segment connects the current node *pcurrent* and the first subsequent path node from the set *S*2. Additionally, then, connecting this current node *pcurrent* and one of all subsequent path nodes from the set *S*<sup>2</sup> through a line segment successively in sequence is conducted until a collision occurs between the line segment and the obstacle (Lines 7–10). In this case, the path nodes between the line segment are removed, and the parent node of the path node that results in the collision with the obstacle is defined as the previous forward node *pf orward*\_*last* (Line 11). Meanwhile, the complementary angle of

the included angle between the vector −−−−−−−−−→ *pcurrentpparent* and the vector −−−−−−−−−−−→ *pcurrenpforward\_ lastt* is

calculated. When the complementary angle is less than the steering angle constraint *δ<sup>f</sup>* , *proot* is redefined as the previous root node *proot*\_*last*, *pparent* is redefined as the new root node *proot*, *pcurrent* is redefined as the new parent node *pparent*, and *pf orward*\_*last* is redefined as the new current node *pcurrent* (Lines 12–13). On the contrary, a path node *pinsert* is inserted between the parent node *pparent* and the previous forward node *pf orward*\_*last* to replace the current node *pcurrent* and meet the conditions that the complementary angle of the included

angle between the vector −−−−−−−−→ *pparentpinsert* and the vector −−−−−−−→ *pparentproot* and the complementary

angle of the included angle between the vector −−−−−−−−→ *pinsertpparent* and the vector −−−−−−−−−−→ *pinsertpforward\_ last* are less than the steering angle constraint *δ<sup>f</sup>* , and the line segment connecting the parent node *pparent* and the inserted node *pinsert* and the line segment connecting the inserted node *qinsert* and the previous forward node *pf orward*\_*last* do not collide with the obstacle (Lines 17–18). After that, *proot* is redefined as the previous root node *proot*\_*last*, *pparent* is redefined as the new root node *proot*, *pinsert* is redefined as the new parent node *pparent*, and *pf orward*\_*last* is redefined as the new current node *pcurrent* (Line 19). The above operations are applied to the remaining path nodes in the set *S*<sup>2</sup> in sequence until the path node *pn*−<sup>1</sup> is found. Finally, suppose the complementary angle of the included angle between the line segment connecting the finally defined *pcurrent* and the finally defined *pparent* and the line segment connecting the finally defined *pcurrent* and *pn* is less than the steering angle constraint *δ<sup>f</sup>* . In that case, *pn* is added to the set *S*<sup>1</sup> (Lines 29–30). Otherwise, the final path node *pn* is added to the set *S*<sup>1</sup> after meeting the conditions that the complementary angle of the included angle between the line segment connecting the finally defined *proot* and the finally defined *proot*\_*last* and the line segment connecting the finally defined *proot* and the inserted node *pinsert*, the complementary angle of the included angle between the line segment connecting the inserted node *pinsert* and the finally defined *proot* and the line segment connecting the inserted node *pinsert* and the finally defined *pcurrent*, and the complementary angle of the included angle between the line segment connecting the finally defined *pcurrent* and the inserted node *pinsert* and the line segment connecting the finally defined *pcurrent* and the final node *pn* are less than the steering angle constraint *δ<sup>f</sup>* , and the line segment connecting the finally defined *proot* and the inserted point *pinsert* and the line segment connecting the inserted point *pinsert* and the finally defined *pcurrent* do not collide with the obstacle (Lines 31–43).

The path node reconnection process is illustrated in Figure 11 specifically. The solid black line is the initial path obtained by the function *Get*\_*Path T* (*Bi* − *T*). The connecting line segments between the black nodes *P*1, *P*2, *P*3, *P*4, *P*5, and *P*6, which are obtained by the process of the path node reconnection, constitute a path to be smoothed in the next step, which is represented as the red solid line. The connecting line segments between the black nodes *P*2, *P*3, *P*4, and *P*<sup>5</sup> do not intersect with the obstacle, removing the redundant nodes between them, namely, the green path nodes. *P*<sup>3</sup> , a path node from the initial path, can directly connect with *P*<sup>2</sup> and *P*4. However, the complementary angle of the included angle between the line segment connecting *P*<sup>3</sup> and *P*2, and the line segment connecting *P*3 and *P*<sup>4</sup> is more than the steering angle constraint *δ<sup>f</sup>* . As a result, a node, namely, node *P*3, is inserted between *P*<sup>2</sup> and *P*<sup>4</sup> based on the constraint *δ<sup>f</sup>* to replace the node *P*<sup>3</sup> for obtaining a relatively gentle path with the maximum length reduction, namely the red path. Furthermore, the complementary angles of the included angles of line segments consisting of these black nodes are less than *δ<sup>f</sup>* , that is, *β*1, *β*2, *β*3, and *β*<sup>4</sup> are less than *δ<sup>f</sup>* . Thus, the path node reconnection method can reduce the length of the initial path to the greatest extent and obtain a path convenient for further smoothing.

```
Algorithm 5: Function path_reorganization ( )
1: Var S0, S1, S2 : path
2: S0(pn ..., p2, p1, p0) ← Get_path T (Bi − T);
3: proot ← p0 ; pparent ← p1 ; pcurrent ← p2 ;
4: S1 ← (proot, pparent, pcurrent);
5: S2 ← (p3 ... pn);
6: while pcurrent! = pn−1 do
7: for each node pi ∈ S2 do
8: if Collision_Free (pcurrent, pi) then
9: pf orward ← pi ;
10: else
11: pf orward_last ← pi−1 ;
12: if (π − Angle (
                       −−−−−−−−−→ pcurrentpparent , −−−−−−−−−−−−−→ pcurrentpforward_ last )) < δf ≤ δf _max then
13: proot_last ← proot ; proot ← pparent ; pparent ← pcurrent ; pcurrent ← q f orward_last ;
14: S1.Backward_Add_Node (pcurrent);
15: else
16: while 1 do
17: pinsert ← Insert_Node (pparent, pf orward_last);
18: if (π − Angle (
                            −−−−−−−→ pparentpinsert , −−−−−−→ pparentproot )) < δf ≤ δf _max
and (π − Angle (
                  −−−−−−−→ pinsertpparent , −−−−−−−−−−−→ pinsertpforward_ last )) < δf ≤ δf _max
and Collision_Free (pparent, pinsert)
and Collision_Free (pinsert, pf orward_last) then
19: proot_last ← proot ; proot ← pparent ; pparent ← pinsert ; pcurrent ← pf orward_last ;
20: S1.Backward_Add_Node (pinsert);
21: S1.Backward_Add_Node (pcurrent);
22: break
23: end if
24: end while
25: end if
26: end if
27: end for
28: end while
29: if (π − Angle (
                   −−−−−−−−−→ pcurrentpparent ,
                                   −−−−−−→ pcurrentpn )) < δf ≤ δf _max then
30: S1.Backward_Add_Node (pn);
31: else
32: while 1 do
33: pinsert ← Insert_Node (proot, pcurrent);
34: if (π − Angle (
                        −−−−−−−−−→ prootproot_ last , −−−−−−−→ prootpinsert )) < δf ≤ δf _max
       and (π − Angle (
                          −−−−−−−−−→ pinsertpcurrent , −−−−−−−→ p insertproot )) < δf ≤ δf _max
    and (π − Angle (
                       −−−−−−−−→ pcurrentpinsert , −−−−−−−→ pcurrentpn )) < δf ≤ δf _max
and Collision_Free (proot, pinsert)
and Collision_Free (pinsert, pcurrent) then
35: S1.Backward_Delete_Node (pend);
36: S1.Backward_Delete_Node (pend−1);
37: S1.Backward_Add_Node (pinsert);
38: S1.Backward_Add_Node (pcurrent);
39: S1.Backward_Add_Node (pn);
40: break;
41: end if
42: end while
43: end if
44: trajectory ← Cubic_Bspline (S1);
45: Return trajectory
```
**Figure 11.** Path node reconnection processing.

#### 3.6.2. Path Smoothing

After the path node reconnection process, a simplified path, the solid red line in Figure 11, is obtained, but the path contains some necessary turning nodes. As a result, the path is still not continuously differentiable such that it cannot be directly tracked by the vehicle. Thus, the obtained path should be further smoothed to make it be successfully tracked by the vehicle [40]. The cubic B-spline curve can move a control point to make the local modification without affecting the overall shape of the path and have a simple implementation and relatively low computational cost to ensure kinematic feasibility while avoiding an obstacle [41]. In addition, it is also often adopted to obtain a continuously differentiable cure [42]. Therefore, the cubic B-spline curve can be used to optimize the obtained path.

Supposing there are *m* + 1 control points *Pi*(*i* = 0, 1, ···, *m*), the cubic B-spline curve is expressed as

$$\begin{cases} \ P\_{k,3}(t) = \sum\_{i=0}^{3} P\_{i+k} \cdot G\_{i,3}(t) & t \in [0,1) \\\ k = 0, 1, \cdot \cdot m - 3 \end{cases} \tag{19}$$

where the basic function *Gi*,3(*t*) is defined as

$$\begin{cases} G\_{0,3}(t) = \frac{-t^3 + 3t^2 - 3t + 1}{6} \\ G\_{1,3}(t) = \frac{3t^3 - 6t^2 + 4}{6} \\ G\_{2,3}(t) = \frac{-3t^3 + 3t^2 + 3t + 1}{6} \\ G\_{3,3}(t) = \frac{t^3}{6} \end{cases} \tag{20}$$

In order to make the cubic B-spline curve start from *<sup>P</sup>*0, tangent to the vector −−→*P*0*P*1, end at *Pm*, and tangent to the vector −−−−→ *PmPm*-1, the control points *<sup>P</sup>*−<sup>1</sup> and *Pm*+<sup>1</sup> are added to meet the conditions *P*−<sup>1</sup> + *P*<sup>1</sup> = 2*P*<sup>0</sup> and *Pm*−<sup>1</sup> + *Pm*+<sup>1</sup> = 2*Pm*.

The path node organization method can make the generated path continuously differentiable while reducing path length to the maximum extent. Meanwhile, the generated path does not have unnecessary steering and meets the tracking requirement of the vehicle.

#### *3.7. Path Coherence Method*

Due to the randomness of the algorithm, the difference in the planned paths between two adjacent frames could easily cause the vehicle to shake during driving, If the difference is too large, it may even cause the vehicle to collide with an obstacle. Hence, to solve this path planning problem in dynamic environments, the interframe path coherence method is introduced to guarantee the smooth connection of planned paths between frames. The interframe path coherence method refers to the need to consider the information of the planned path of the previous frame when planning a path in the current frame. Its main idea is that at the beginning of each new planning cycle, the root node *Proot*\_*newly* of the new planning cycle is the position whose distance from the root node in the trajectory planned in the previous planning cycle is equal to *R*. At the same time, a point along the tangent line of the newly generated root node is selected as the new search starting node *Pinitial*\_*newly*. Equations (21) and (22) express the calculation of the root node *Proot*\_*newly*. Equation (23) shows how to calculate *Pinitial*\_*newly*.

$$\begin{cases} \begin{cases} \begin{array}{l} P\_{k,3}(t) = \sum\limits\_{i=0}^{3} P\_{i+k} \cdot G\_{i,3}(t) \quad t \in [0,1) \\\ k = 0, 1, \cdot \cdot m - 3 \end{array} \\\ \begin{array}{l} \left(x - P\_{\text{rot}\\_x}\right)^2 + \left(y - P\_{\text{rot}\\_y}\right)^2 = R^2 \end{array} \end{cases} \end{cases} \tag{21}$$

$$
\mathcal{R} = \rho \cdot v \tag{22}
$$

$$\begin{cases} \; \theta = \tan^{-1}(B\_{slope})\\ \; \; P\_{root\\_nwly} = P\_{initial\\_nwly} + L\_{skrw}(\cos \theta, \sin \theta) \end{cases} \tag{23}$$

where *Proot* is the root node of the previous planning cycle, *R* is the offset distance of the root node *Proot*, and *ρ* is the coefficient of proportionality. *R* is more than the preview distance of the vehicle to ensure that the preview point is still on the trajectory of the previous frame. As a result, there is no need to make more tracking control adjustments. *Proot*\_*newly* is the right intersection point of the cubic B-spline curve and the designed circle. *Bslope* is the slope of the cubic B-spline curve at the intersection point, *ϑ* is the included angle between the tangent line and the *x* axis at the intersection point, and *Lskew* is the offset distance along the direction of the tangent line. *Proot*\_*newly* and *Pinitial*\_*newly* are introduced to ensure that the trajectory generated in the current frame is tangent to the tangent line at the intersection point *Proot*\_*newly* and passes through the intersection point *Proot*\_*newly*. Thereby, there are smooth connections in the interframe paths. As seen in Figure 12, the solid green line refers to the broken line formed by the path control points of the previous frame, and the solid black line is the broken line formed by the path control points of the current frame. The red curve represents the trajectory generated in the previous frame, the blue curve represents the trajectory generated in the current frame, and both curves are tangent to the vector

```
−−−−−−−−−−−−−−→
Proot_ newlyPinitial_ newly . Hence, those two paths are smoothly connected at Proot_newly.
```
The path coherence method can significantly eliminate the difference between interframe trajectories and make the planned trajectory smooth and continuous, resulting in maintaining the vehicle's overall stability during driving.

#### **4. Simulation Results and Analysis**

#### *4.1. Simulation Environment*

In order to verify the performance of the improved heuristic Bi-RRT algorithm, two typical road scenarios, including a straight road scenario and a curved road scenario, as shown in Figure 13a,b, are separately taken into account. Road information and other parameters in the whole simulation are shown in Tables 1 and 2. The performance comparison of different RRT variants in the static straight road and curved road scenarios is conducted to verify the superiority of the improved heuristic Bi-RRT method. Meanwhile, interframe path planning and tracking are performed in dynamic scenarios to verify the effectiveness of the proposed algorithm. The simulations were performed on a PC with processor Intel I7 based on MATLAB R2019b and Carsim 2018. MATLAB R2019b is a mathematical software developed by Mathworks in Massachusetts, USA, and Carsim 2018 is a vehicle system simulation software developed by Mechanical Simulation Corporation in Michigan, USA.

(**b**)

**Figure 13.** Road scenarios. (**a**) Straight road scenario. (**b**) Curved road scenario.

**Table 1.** Road parameters.


**Table 2.** Simulation parameters.


#### *4.2. Performance Measure of Path Planning*

The improved heuristic Bi-RRT algorithm is compared with the basic RRT, the biased RRT, the Bi-RRT, and the RRT\* in the straight and curved road scenarios, and the results are shown in Figure 14a,b, respectively. The blue dotted line represents the path planned by the basic RRT, the red dashed line represents the path planned by the biased RRT, and the black dash-dotted line represents the path planned by the RRT\*. The solid green and black lines represent the paths planned by the Bi-RRT and the improved heuristic Bi-RRT, respectively. It can be seen in Figure 14a,b that the paths generated by the basic RRT, the Bi-RRT, and the biased RRT contain a large number of corners, and there are frequent large curvature changes. In contrast, the path generated by the RRT\* is relatively gentle, but it is still a broken line, which does not meet the steering requirement of the vehicle. However, the path generated by the improved heuristic Bi-RRT is a continuous and smooth curve, and the path curvature is also continuous, as shown in Figure 15a,b, respectively, resulting in the convenience being well followed by the vehicle. Observing Figure 14a,b, all paths planned by the basic RRT, the biased RRT, the Bi-RRT, the RRT\*, and the improved heuristic Bi-RRT can successfully avoid the obstacle vehicle. However, there are differences in obstacle avoidance modes. The paths planned by the basic RRT, the biased RRT, the Bi-RRT, and the RRT\* always start emergency obstacle avoidance when approaching the obstacle vehicle. The path planned by the proposed algorithm can start obstacle avoidance in advance by a safe distance from the obstacle vehicle, ensuring safe driving, conforming to the driver's behavior habit, and not bringing excessive tension to passengers. The proposed algorithm can realize obstacle avoidance in advance because the obstacle avoidance distance is embedded when constructing the obstacle vehicle constraint.

**Figure 14.** The simulation paths. (**a**) The planned paths in the straight road scenario. (**b**) The planned paths in the curved road scenario.

**Figure 15.** The simulation path curvatures. (**a**) The path curvature in straight road scenario. (**b**) The path curvature in curved road scenario.

A set of benchmarking parameters is defined to objectively compare the performance of the improved heuristic Bi-RRT and some RRT variants. The number of nodes on the mature random tree is denoted by "tree nodes." The length of the generated path is denoted by "path length." The searching time of path planning is denoted by "time." In addition, the number of segments comprising the generated path is donated by "path segments"; in particular, the path segments of the improved heuristic RRT refer to the number of segments of the path after being processed by path reconnection [30,43,44].

Thirty independent simulation experiments were implemented on each algorithm to offset the random deviation of a single experiment, and the results are shown in Tables 3 and 4. The average path length generated by the RRT\* algorithm is much lower than those of the basic RRT, the biased RRT, and the Bi-RRT because the RRT \* embedded with the reconnection mechanism can approach the approximate shortest path. However, the average path length generated by the improved heuristic Bi-RRT algorithm is smaller than that of the RRT\* algorithm, and the average tree nodes and the average path segments of the proposed algorithm are also minimal compared with those of the other four algorithms. The decrease in path length and number of path segments is mainly due to the introduction of the path node reconnection method. The relatively small number of path segments can reduce the control difficulty of the vehicle and make the further smoothed path not contain unnecessary turns. In terms of the average planned time, the performance of the improved heuristic Bi-RRT algorithm is optimal compared with those of the other four algorithms, and especially in the straight road scenario, the planning time of the improved heuristic Bi-RRT algorithm can reduce significantly.


**Table 3.** Performance measures in the straight road scenario.

**Table 4.** Performance measures in the curve scenario.


As a result of the performance measure, the improved heuristic Bi-RRT algorithm shows superior performance regarding path quality and planning time compared with the basic RRT, the biased RRT, the Bi-RRT, and the RRT\*.

#### *4.3. Path Coherence Validation*

The dynamic environment can usually be treated as a static environment in a dynamic path planning process. That is, the re-planning of paths needs to be conducted in each frame environment region obtained by the perception module. Thus, the running speed of the path planning algorithm should not be only considered, but also the interframe connection of the planned paths should be concerned. In order to further verify the superior performance of the improved heuristic Bi-RRT algorithm, the planning effect of the corresponding algorithm at a specific frame is described in detail.

In order to express the coherence between the front and back frame paths, the realtime re-planning is carried out in two dynamic driving scenarios, where the host vehicle and the obstacle vehicle move in opposite and the same directions, respectively. The red triangle represents the position of the obstacle vehicle in the current frame, the yellow arrow represents the driving direction of the obstacle vehicle, and the green solid point and the black solid point represent the start point and end point of the current frame, respectively.

Figure 16a–c show the dynamic path planning process of the improved heuristic Bi-RRT algorithm for three consecutive frames in the dynamic scenario with driving in opposite directions. Observing Figure 16a–c, the paths obtained in the first, second, and third frames are smooth and successfully bypass the moving obstacle vehicle. It can be seen from Figure 16d that the paths, magenta lines, obtained in the first frame, the second frame, and the third frame can be smoothly connected to form a smooth coherence path, as shown in Figure 16e. Moreover, the curvature of the coherence path is continuous and varies in a small range, which is convenient for it to be tracked by the host vehicle, as shown in Figure 16f.

Figure 17a–f show the dynamic path planning process of the improved heuristic Bi-RRT algorithm for six consecutive frames in the dynamic scenario with driving in the same direction. It can be seen from Figure 17a–f that the paths obtained in the first, second, third, fourth, fifth, and sixth frames are smooth and also successfully bypass the moving obstacle vehicle. As seen in Figure 17g, the paths, magenta lines, obtained in the first frame, the second frame, the third frame, the fourth frame, the fifth frame, and the sixth frame can be smoothly connected to generate a smooth coherence path, as shown in Figure 17h. In addition, the curvature of the obtained coherence path is continuous and within the range of 0.02 1/m, as shown in Figure 17i, resulting in the condition that the host vehicle can easily track the coherence path.

**Figure 16.** *Cont*.

**Figure 16.** The path planning results in a dynamic scenario with driving in opposite directions. (**a**) The planned path in the first frame. (**b**) The planned path in the second frame. (**c**) The planned path in the third frame. (**d**) The path coherence process of three frames. (**e**) The coherence path of three frames. (**f**) The curvature of the coherence path.

**Figure 17.** *Cont*.

**Figure 17.** *Cont*.

**Figure 17.** The path planning results in a dynamic scenario with driving in the same direction. (**a**) The planned path in the first frame. (**b**) The planned path in the second frame. (**c**) The planned path in the third frame. (**d**) The planned path in the fourth frame. (**e**) The planned path in the fifth frame. (**f**) The planned path in the sixth frame. (**g**) The path coherence process of six frames. (**h**) The coherence path of six frames. (**i**) The curvature of the coherence path.

For further verifying the tracking performance of the obtained coherence paths shown in Figures 16e and 17h, the path following experiments can be carried out in the Carsim simulation platform. The appropriate vehicle and driver models are selected to simulate the driving state of the real vehicle. The solid black and red dashed lines represent the target and followed paths, respectively.

The path following experiment in the dynamic scenario with driving in opposite directions is conducted. The result shown in Figure 18a represents the target path and the followed path. The following error between the followed path and the target path is relatively small and within the range of 0.06 m, as shown in Figure 18b, resulting in making the following effect acceptable. The yaw rate is within the range of 8 deg/s, as shown in Figure 18c, and the lateral acceleration is within the range of 0.25 g and less than the usual value of 0.8 g of the maximum lateral acceleration of steady driving, as shown in Figure 18d. These two indicators prove that the dynamic performance of the host vehicle is stable in the simulation experiment. Based on these facts, the obtained coherence path is satisfactory and effective.

**Figure 18.** The path tracking results of the coherence path in a dynamic scenario with driving in opposite directions. (**a**) The path following result. (**b**) The path following error. (**c**) The yaw velocity. (**d**) The lateral acceleration.

The path following experiment in the dynamic scenario with driving in the same direction is conducted. The target path and the followed path are shown in Figure 19a. The following error between the followed path and the target path is within the range of 0.07 m, as shown in Figure 19b, and is relatively small, which shows that the coherence path has a satisfactory tracking effect. The yaw rate and the later acceleration of the host vehicle when following the trajectory are shown in Figure 19c,d, respectively. The yaw rate is within the range of 7 deg/s, and the lateral acceleration is within the range of 0.2 g and less than the usual value of 0.8 g of the maximum lateral acceleration of steady driving, which shows the excellent dynamic performance of the host vehicle in the simulation process. Thus, the obtained coherence path is feasible and acceptable.

**Figure 19.** *Cont*.

**Figure 19.** The path tracking results of the coherence path in a dynamic scenario with driving in the same direction. (**a**) The path following result. (**b**) The path following error. (**c**) The yaw velocity. (**d**) The lateral acceleration.

It can be seen from the test results that the algorithm proposed in this article can plan a path with smooth transition connections and continuous curvature. Furthermore, it is applied successfully in the dynamic driving scenarios of opposite driving and traveling in the same direction, which are the most common in vehicle driving.

#### **5. Discussion**

The path planning of the vehicle in dynamic scenarios often pays attention not only to the quality of each frame path but also to the difference in paths between frames. In addition, the Bi-RRT algorithm, a variant of the basic RRT algorithm, is often used for path planning because of its probability completeness and rapidity. However, its planned path is not differentiable and relatively poor in length. Based on those facts, some improved heuristic methods are introduced to the Bi-RRT algorithm to make it suitable for the dynamic path planning of the vehicle. The multi-sampling method biased towards the target state makes the growth of the random tree directional and reasonable. The adaptive greedy step size considering the target direction can increase the success rate of node expansion and make the newly extended node tend to the target state direction to a certain extent. The two random trees are directly interconnected when there are no obstacles between them, as a result, which further reduces the running time of the algorithm. Changing from only considering the Euclidean distance to considering the distance from the target state and the included angles between the connection lines of tree nodes, the parent node selection method improves the running speed of the algorithm to a certain extent while making the generated path tend to be gentle. The path reorganization method, including path node reconnection and path smoothing, can remove necessary turning points, significantly reduce the path length, and plan a path with continuous curvature. As for the problem of path connection between frames, the path coherence method can make paths between different frames smoothly connect to form a differentiable path. By way of the simulation experiment study, the improved heuristic Bi-RRT algorithm has a real-time performance, especially in a straight road scenario, and guarantees the shortest path while obeying the road constraints and the vehicle constraint and considering the driving habit of the driver. On the contrary, the vehicle constraint and the driver's driving habits are not considered when applying the basic Bi-RRT algorithm.

However, it must be admitted that the running time may not be fast enough when the proposed algorithm is applied in a high-speed and dynamic curved road scene, which may be due to a large number of numerical calculations based on graphic geometry in obstacle detection. In future research, obstacle detection of gray value comparison and parallel computing are introduced to further reduce the proposed algorithm's running time to meet the path planning requirement in a high-speed curved road scenario. After the preparation of the perception module, changing from the numerical calculation mode of graphics geometry to the comparison of gray values mode on binarized images, the obstacle detection can greatly reduce the calculation amount of the algorithm, thus speeding up the search speed of the algorithm. On this basis, the improved heuristic Bi-RRT algorithm can be applied to more complex driving scenarios, such as unstructured road scenes with a mixture of static and moving obstacles with grooves, to explore its adaptability. Furthermore, compared with other related algorithms, finding out the algorithm's shortcomings and the scenes where the algorithm cannot be applied are made to improve the algorithm and enable it to be applied to more driving scenes.

#### **6. Conclusions**

This paper is concerned with the path planning of autonomous vehicles in a dynamic environment. An improved heuristic Bi-RRT algorithm has been proposed and tested. The proposed algorithm can solve the path query problem of the basic Bi-RRT algorithm and the interconnection problem of paths between various frames in a dynamic scenario to obtain a smooth and asymptotically optimal path with continuous curvature with high efficiency and accuracy. The proposed path planning algorithm consists of the obstacle-free direct connection of two trees, the heuristic target bias sampling, the heuristic parent node selection, the heuristic node extension, the improved constraints, path reorganization, and path coherence. The obstacle-free direct connection mode can further accelerate the interconnection of the two random trees. The heuristic target bias sampling can reduce blind sampling, and the heuristic node extension can decrease invalid expansion, thereby accelerating the running speed and improving the searching efficiency of the algorithm. The heuristic parent node selection speeds up the algorithm's calculation and improves path quality to some extent. The improved environmental constraint and the improved vehicle constraint integrated with the advanced obstacle avoidance distance are considered together to make the vehicle avoid the obstacle in advance and accurately and make the vehicle drive safely. Path reorganization aims to post-process the initial path to obtain a reasonable and differentiable path with the approximate optimal length, which can be tracked by the vehicle smoothly and successfully. In addition, path coherence solves the problem of the smooth connection of paths between different frames, enabling the vehicle to run smoothly and steadily at the connection point. Through the simulation experiments, the improved heuristic Bi-RRT algorithm can generate the smoothest path and takes the shortest time compared with the other four algorithms. As a result, it is an effective local path planning algorithm for the autonomous vehicle and has practical value in the application of the wheeled robot.

In future works, the research focuses on increasing the solution speed, further reducing the calculation time, especially in the curved road scenario. The proposed algorithm will be applied in more complex driving scenarios, such as the parking scene and the drift scene with moving obstacles, to test its adaptiveness. Moreover, after the preparation of the test platform of the autonomous vehicle, an on-site experiment is conducted to test its effectiveness in practical applications.

**Author Contributions:** Conceptualization, X.Z.; methodology, X.Z.; software, X.Z.; validation, Y.H. and L.D.; resources, T.Z.; writing—original draft preparation, X.Z.; writing—review and editing, T.Z. and H.L.; funding acquisition, T.Z. and Y.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Key R&D Program of China (No. 2019YFE0108000) and the Shaanxi Provincial Basic Research Program of Natural Sciences Project (No. 2022JQ-540).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data is contained within the article as figures and tables.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Computational Efficient Motion Planning Method for Automated Vehicles Considering Dynamic Obstacle Avoidance and Traffic Interaction**

**Yuxiang Zhang 1, Jiachen Wang 2, Jidong Lv 3, Bingzhao Gao 4, Hongqing Chu 5,\* and Xiaoxiang Na <sup>6</sup>**


**Abstract:** In complex driving scenarios, automated vehicles should behave reasonably and respond adaptively with high computational efficiency. In this paper, a computational efficient motion planning method is proposed, which considers traffic interaction and accelerates calculation. Firstly, the behavior is decided by connecting the points on the unequally divided road segments and lane centerlines, which simplifies the decision-making process in both space and time span. Secondly, as the dynamic vehicle model with changeable longitudinal velocity is considered in the trajectory generation module, the C/GMRES algorithm is used to accelerate the calculation of trajectory generation and realize on-line solving in nonlinear model predictive control. Meanwhile, the motion of other traffic participants is more accurately predicted based on the driver's intention and kinematics vehicle model, which enables the host vehicle to obtain a more reasonable behavior and trajectory. The simulation results verify the effectiveness of the proposed method.

**Keywords:** autonomous vehicles; trajectory planning; model predictive control

#### **1. Introduction**

Due to the complex driving scenarios and difficulty in accurately predicting the behavior of the surrounding vehicles, the automated vehicles need to adapt to great complexity and dynamics in real traffic [1,2]. Therefore, advanced motion planning algorithms should help the agent behave reasonably and respond adaptively in dynamic and complex driving scenarios with computational efficient and reliable control [3].

#### *1.1. State-of-the-Art Review and Challenges*

The performance of motion planning methods is closely related to the trajectory prediction of other traffic participants, behavior planning, and trajectory generation. As automated vehicles will frequently interact with other traffic participants, trajectory prediction will influence behavior planning and trajectory generation modules. The motion of other environment vehicles is predicted with constant longitudinal velocity or acceleration [4]. Such a prediction is inaccurate and will decrease the reasonability of behavior planning and trajectory generation modules [5,6]. Thus, more accurately predicted trajectories will promote the performance of motion planning.

In behavior planning aspects, the machine learning-based or model-based methods usually decide a human-like behavior, like lane-change and obstacle avoidance, which contains a wide range of trajectories. To ensure absolute safety, decision-making methods are always conservative. Thus, more complex behaviors should be generated in

**Citation:** Zhang, Y.; Wang, J.; Lv, J.; Gao, B.; Chu, H.; Na, X. Computational Efficient Motion Planning Method for Automated Vehicles Considering Dynamic Obstacle Avoidance and Traffic Interaction. *Sensors* **2022**, *22*, 7397. https://doi.org/10.3390/s22197397

Academic Editor: Sindhuja Sankaran

Received: 22 August 2022 Accepted: 23 September 2022 Published: 28 September 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

decision-making. POMDP can generate a more abundant behavior [7,8]. The road is divided into several segments and the points are connected to represent different behavioral decisions [9,10].

Regarding trajectory generation, current methods can be divided into graph-searchbased methods [11], incremental search [12], interpolating curve methods [13] and numerical optimization [14,15]. Model Predictive Control (MPC) is widely used because it can explicitly deal with constraints to ensure safety with consideration of traffic interaction not only at the current time step but also in the predictive horizon [16,17]. As for the control model in MPC, the dynamic vehicle model is added to the kinematic model to realize stable motion control and enrich driving behaviors [18,19].

The high-performance motion planning methods should be computationally efficient and enable automated vehicles to behave reasonably and respond adaptively in dynamic and complex scenarios. First, the host vehicle will interact with the surrounding vehicles while driving on the road inevitably. The motion prediction of environment vehicles needs to be predicted in the planning horizon, which enables the host vehicle to behave reasonably and adapt to the dynamic traffic environment. Meanwhile, the driving process contains a large span in both time and space, which means precise driving behavior will cause huge calculations. To balance the calculation time and planning performance, the problem in the model formulation aspect should be simplified without losing reasonability.

To further promote the reasonability of motion planning, except for the dynamic vehicle model, variational velocity can be considered in the trajectory generation. Thus, the control model will change from the linear model to the nonlinear model [20]. Such a control model increases the time for online solving, which needs an additional fast solving algorithm to realize online calculation [21–23].

#### *1.2. Work and Contributions*

As shown in Figure 1, this paper proposes a computational efficient motion planning method for autonomous vehicles, which can behave reasonably and adaptively in dynamic and complex driving scenarios. After predicting the trajectory of environment vehicles, the behavior planning and trajectory generation will be done sequentially. The following improvements simplify the problem and decrease calculation to realize computational efficiency. In the behavior planning module, the road is divided into several segments with road points and different behaviors are represented with different connections between road points in each segment. Firstly, rather than only set road points in the center of the lane, the road points are also distributed on the lane line, which enables the behavior planner to generate more complex behaviors. Meanwhile, in predictive control, short-term behavior is much more complex and also should be paid more attention to. Besides, the prediction is not accurate while lengthening the predictive horizon. Thus, rather than equally dividing the road, unequal segments that the distance between these road segments is gradually increasing can further decrease calculation and raise reasonability. Secondly, based on the analysis of different traffic participants, the motion of static environment vehicles is much more simple, and can directly exclude corresponding behaviors. Therefore, static environment vehicles are used to narrow the feasible region of the solution and further decrease online calculation. Thirdly, variational longitudinal velocity and dynamic vehicle models are considered to raise the reasonability of trajectory generation. It can also speed up the trajectory following process with the resulting acceleration and steering wheel angle. We use the C/GMRES algorithm to realize on-line calculations in NMPC. The main contributions are summarized as follows


• C/GMRES is used to realize online calculation and raise the reasonability of trajectory generation and trajectory following.

The remainder of this paper is organized as follows. In Section 2, the coordinate systems and the trajectory prediction are introduced. Section 3 introduces the behavioral planning module. In Section 4, the trajectory generation module is introduced. In Section 5, the simulation process is shown, and the results are given and analyzed in detail. The simulation results verify the effectiveness of the proposed method. Section 6 is the conclusion of this paper.

**Figure 1.** Diagram of the proposed motion planning frame.

#### **2. Coordinate Systems Conversion and Trajectory Prediction**

In this section, first, the conversion between the Cartesian coordinate system and the Frenet coordinate system is introduced to simplify the planning process. Then, three ways of trajectory prediction are illustrated and compared.

#### *2.1. Coordinate Systems Conversion*

To describe the relation between two coordinates, as shown in Figure 2, the Cartesian coordinate of *x* is *x*(*x*, *y*) while the Frenet coordinate of *x* is *x*(*s*, *l*), where *l* is the distance from *x* to the reference point [24]. In coordinate systems conversion, the lane centerline is extracted with a third-degree polynomial equation as the reference curve of the Frenet coordinate system, e.g., *y* = *ax*<sup>3</sup> + *bx*<sup>2</sup> + *cx* + *d*. For the conversion from Frenet coordinate *Ca* : (*sa*, *la*) to Cartesian coordinate *Ca* : (*xa*, *ya*), a nearest point *p*<sup>0</sup> works as the reference point to convert *pa*, whose Frenet coordinate can be written as *C*<sup>0</sup> : (*x*0, *ax*<sup>3</sup> <sup>0</sup> + *bx*<sup>2</sup> <sup>0</sup> + *cx*<sup>0</sup> + *d*). Since the arc length *sa* is already known, *x*<sup>0</sup> can be calculated by dividing the curve and sampling from the starting point *ps*. The integral value of the sampling point *s* <sup>0</sup> can similarly be calculated. Compare *s* <sup>0</sup> with *sa* to determine whether the *x* <sup>0</sup> is the desired coordinate point *x*0, and finally find the coordinates of *p*0. And the Cartesian coordinate of *pa*, *Ca* : (*xa*, *ya*) can be calculated with

$$\mathbf{x}\_{4} = \mathbf{x}\_{0} - l \cdot \sin(\arctan(k))\,\,,\tag{1a}$$

$$y\_a = y\_0 - l \cdot \cos(\arctan(k)) \,. \tag{1b}$$

where *k* = 3*ax*<sup>2</sup> <sup>0</sup> + 2*bx*<sup>0</sup> + *c* is the curvature of the reference curve at point *p*0.

**Figure 2.** Diagram of coordinate systems conversion. *ps* is the starting point, *pe* is the ending point. *pa* is the point on the trajectory of the vehicle to be converted. *p*<sup>0</sup> is the nearest point that works as the reference point to convert *pa*. *Tx* is the reference curve tangent vector in the Frenet coordinate system, and *N <sup>x</sup>* is the normal vector in the Frenet coordinate system. *x*(*s*, *l*) is the Frenet coordinate of *x*.

For the conversion from Cartesian coordinate *Ca* : (*xa*, *ya*) to Frenet coordinate *Ca* : (*sa*, *la*), the reference point *p*<sup>0</sup> is also needed. We use *D*<sup>2</sup> to represent the square of the distance from *pa* to the reference point *p*0. The horizontal coordinate value of the reference point *x*∗ <sup>0</sup> that minimizes *D*<sup>2</sup> satisfies *D* 2(*x*<sup>∗</sup> <sup>0</sup> ) = 0, which can be solved by Newton's method [25]. By calculating *D*<sup>2</sup> and *D* <sup>2</sup>, the iteration formula can be expressed as

$$\mathbf{x}\_0^{\*,m+1} = \mathbf{x}\_0^{\*,m} - \frac{D\_2'(\mathbf{x}\_0^{\*,m})}{D\_2'(\mathbf{x}\_0^{\*,m})}, m = 0, 1, 2, \dots \tag{2}$$

The iteration stops while <sup>|</sup>*x*∗,*m*+<sup>1</sup> <sup>0</sup> <sup>−</sup> *<sup>x</sup>*∗,*<sup>m</sup>* <sup>0</sup> | *δ* and *x*∗,*m*+<sup>1</sup> <sup>0</sup> is the target value. The Frenet coordinate *Ca* : (*sa*, *la*) can be solved as

$$\mathbf{s}\_d = \int\_{x\_0}^{x\_0^{\*,m+1}} \sqrt{1 + (3a\mathbf{x}^2 + 2b\mathbf{x} + c)^2} d\mathbf{x} \,\,\,\tag{3a}$$

$$d\_{\mathfrak{a}} = D\_{\mathfrak{Z}}(\mathbf{x}\_0^{\*,m+1}) = \sqrt{(\mathbf{x}\_{\mathfrak{a}} - \mathbf{x}\_0^{\*,m+1})^2 + (y\_{\mathfrak{a}} - y\_0^{\*,m+1})^2}. \tag{3b}$$

#### *2.2. Trajectory Prediction*

The information about the surrounding vehicles and the trajectories of the surrounding vehicles in a period of time in the future is essential for motion planning. Such trajectory prediction can be done based on the driver's intention, vehicle kinematics model, or both driver's intention and vehicle kinematics model, which will be compared in this section.

#### 2.2.1. Trajectory Prediction Based on Driver's Intention

We consider lane change and lane-keeping operations. For an operation intention, countless driving trajectories can be realized. Based on the driver of the vehicle, the actual driving trajectory may be very gentle or aggressive. In addition, the geometric environment of the road will also affect the actual trajectory. Therefore, the trajectory prediction based on the operation intention can generate a set of predicted trajectories based on the current state of the vehicle, the operation intention, and the road parameters, and then select the optimal one based on the information. Since the shape of the road has a great influence on the predicted trajectory, the predicted trajectory cluster is firstly generated in the Frenet coordinate system, then converted into the Cartesian coordinate system.

In Frenet coordinate system, *s*(*t*) and *l*(*t*) represent longitudinal distance and lateral distance, respectively. We use *F*<sup>0</sup> = (*s*0,*s*˙0,*s*¨0, *l*0, ˙ *l*0, ¨ *l*0) and *F*<sup>1</sup> = (*s*1,*s*˙1,*s*¨1, *l*1, ˙ *l*1, ¨ *l*1) to represent the initial state and the end state of the vehicle trajectory. To ensure the continuity of the trajectory and provide unique expressions for different trajectories, high-order polynomials are used for fitting to represent the trajectory of longitudinal *s*(*t*) and lateral distance *l*(*t*) over time *t*. For the initial state *F*0, each state variable can be obtained by converting the current kinematic parameters in the Cartesian coordinate system, which can be expressed as

$$\begin{cases} l\_0 = l\_{0\prime}^\* \dot{l}\_0 = v\_0 \sin(\theta\_0 - \theta\_{T\_0}) \\ \ddot{l}\_0 = \sqrt{(a\_{0^2} + \gamma\_0 v\_{0^2})} \sin(\theta\_0 - \theta\_{T\_0}) \\ s\_0 = 0, \dot{s}\_0 = v\_0 \cos(\theta\_0 - \theta\_{T\_0}) \\ \ddot{s}\_0 = \sqrt{(a\_{0^2} + \gamma\_0 v\_{0^2})} \cos(\theta\_0 - \theta\_{T\_0}) \end{cases} \tag{4}$$

where *l* ∗ <sup>0</sup> is the distance from the initial position to the centerline of the lane. *θT*<sup>0</sup> is the orientation of the tangent vector *T*0. *γ*0*v*<sup>2</sup> <sup>0</sup> is the current value of the normal acceleration of the vehicle and *γ* is the curvature of the reference point.

We assume that the vehicle is on the center line of its target lane after finishing the intended operation and it remains the same longitudinal acceleration throughout the operation. Therefore, some of the operation termination state *F*<sup>1</sup> can be calculated as

$$\begin{cases} l\_1 = l\_1^\*, \dot{l}\_1 = 0 \\ \ddot{l}\_1 = 0, \ddot{s}\_1 = a\_0 \end{cases} \tag{5}$$


For a complete lane change operation, the time to complete the operation *tend* is about 6 s, and the length of *tend* can be adjusted according to the driver's driving style. For lane-keeping operations, the time *tend* is significantly shorter. Use *t*<sup>1</sup> ∈ [0, *tend*] to represent the end time of the operation, that is, take a fixed step size and sample start from 0 to *tend* with *K* steps. Since it is assumed that the longitudinal acceleration of the vehicle remains unchanged, the longitudinal speed at the termination state is *s*˙1 = *v*<sup>0</sup> + *a*<sup>0</sup> · *t*1. For the lateral distance at the termination state, a fifth-degree polynomial fit for time *t* can be used, which is calculated as

$$l(t) = c\mathfrak{s}t^5 + c\_4t^4 + c\_3t^3 + c\_2t^2 + c\_1t + c\_0. \tag{6}$$

where *c*0, *c*2, ... , *c*<sup>5</sup> are the parameters of the curve. Then, ˙ *l*(*t*) and ¨ *l*(*t*) can be calculated respectively. Therefore, the lateral state values at the initial state (*t* = 0) and the terminal lateral state (*t* = *t*1) can be written as

$$\begin{cases} l\_0 = l(0), \dot{l}\_0 = \dot{l}(0), \ddot{l}\_0 = \ddot{l}(0) \end{cases},$$

$$\begin{cases} l\_1 = l(t\_1), \dot{l}\_1 = \dot{l}(t\_1), \ddot{l}\_1 = \ddot{l}(t\_1) \end{cases} \tag{7}$$

And the parameters *ci* can be obtained by solving the following matrix

$$\begin{aligned} \begin{bmatrix} 0 & 0 & 0 & 0 & 0 & 1 \\ t\_1^3 & t\_1^4 & t\_1^3 & t\_1^2 & t\_1 & 1 \\ 0 & 0 & 0 & 0 & 1 & 0 \\ 5t\_1^4 & 4t\_1^3 & 3t\_1^2 & 2t\_1 & 1 & 0 \\ 0 & 0 & 0 & 2 & 0 & 0 \\ 20t\_1^3 & 12t\_1^2 & 6t\_1 & 2 & 0 & 0 \end{bmatrix} \cdot \begin{bmatrix} c\_5 \\ c\_4 \\ c\_3 \\ c\_2 \\ c\_1 \\ c\_0 \end{bmatrix} &= \begin{bmatrix} l\_0 \\ l\_1 \\ l\_0 \\ l\_1 \\ l\_0 \\ l\_1 \end{bmatrix} \\ &= \begin{bmatrix} l\_0^\* \\ l\_1 \\ \sqrt{(a\_0 - \theta\_{T\_0})} \\ \sqrt{(a\_0 + \gamma\_0 v\_{02})} \sin(\theta\_0 - \theta\_{T\_0}) \\ 0 \\ 0 \end{bmatrix}. \end{aligned} \tag{8}$$

Since the longitudinal displacement changes according to *t*1, a fourth-degree polynomial with respect to time *t* is used to fit the longitudinal kinematic, which can be expressed as

$$s(t) = c\_4't^4 + c\_3't^3 + c\_2't^2 + c\_1't + c\_0' \tag{9}$$

where *c* <sup>0</sup>, *c* <sup>2</sup>, ... , *c* <sup>5</sup> are the parameters. The initial states and the terminal states can be represented by replacing *t* in the above formula with 0 and *t*1, respectively.

$$\begin{cases} \mathbf{s}\_0 = \mathbf{s}(0) \text{ } \dot{\mathbf{s}}\_0 = \dot{\mathbf{s}}(0) \text{ } \ddot{\mathbf{s}}\_0 = \ddot{\mathbf{s}}(0) \text{ } \\ \dot{\mathbf{s}}\_1 = \dot{\mathbf{s}}(t\_1) \text{ } \ddot{\mathbf{s}}\_1 = \ddot{\mathbf{s}}(t\_1) \text{ } \end{cases} \tag{10}$$

Therefore, the parameters *c <sup>i</sup>* can be obtained by solving the following matrix

$$\begin{aligned} \begin{bmatrix} 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 1 & 0 \\ 4t\_1^3 & 3t\_1^2 & 2t\_1 & 1 & 0 \\ 0 & 0 & 2 & 0 & 0 \\ 12t\_1^2 & 6t\_1 & 2 & 0 & 0 \end{bmatrix} \cdot \begin{bmatrix} c\_4' \\ c\_3' \\ c\_2' \\ c\_1' \\ c\_0' \end{bmatrix} &= \begin{bmatrix} \text{s} \\ \dot{\theta}\_0 \\ \dot{\kappa}\_1 \\ \ddot{\kappa}\_0 \\ \ddot{\kappa}\_1 \end{bmatrix}, \\\ &= \begin{bmatrix} 0 \\ \frac{v\_0 \cos(\theta\_0 - \theta\_{T\_0})}{\sqrt{(a\_0 + \gamma\_0 v\_0)^2} \cos(\theta\_0 - \theta\_{T\_0})} \\ \upsilon\_0 + a\_0 \cdot t\_1 \\ 0 \end{bmatrix}. \end{aligned} \tag{11}$$

Different *t*<sup>1</sup> ∈ [0, *tend*] corresponds to different fitting parameters *c* and different driving trajectories. When selecting the optimal trajectory, first convert the trajectory to the Cartesian coordinate system. The principles to be followed in the selection process include: *t*<sup>1</sup> is as short as possible, the driving process is comfortable, and the lateral displacement is reduced as much as possible during the lane change operation. Therefore, for the selection of the optimal predicted trajectory based on the driver's intention, the maximum normal acceleration value in the driving trajectory and the time to complete the operation *t*<sup>1</sup> are mainly considered

$$\mathbb{C}(traj\_i) = \mathbf{w}\_1 \mathbf{max}(\mathbb{d}(t)) + \mathbf{w}\_2 t\_i,\tag{12}$$

where *traji* represents the *ith* trajectory and *C*(*traji*) represents the cost of the *ith* trajectory. *a*¯(*t*) is the normal acceleration at time *t*. *ti* is the total time of the *ith* trajectory. w1 and w2 are two coefficients. The resulting trajectory with the smallest cost value is used as the optimal prediction trajectory based on the operation intention prediction

$$T\_{\text{man}} = \arg\min(C(traj\_i))\_{i=1,\ldots,K} \cdot \tag{13}$$

#### 2.2.2. Trajectory Prediction Based on Vehicle Kinematics Model

The trajectory predicted based on the driver's intention is more accurate at a longer time horizon, but the accuracy is lower on a shorter time horizon. The trajectory predicted using the current kinematic parameters of the vehicle is more accurate in a shorter time. So, it is necessary to put both the intention and kinematics model into consideration. We assume that the vehicle acceleration and yaw rate remain unchanged. Therefore, in the Cartesian coordinate system, the vehicle speeds along the *x* and *y* axes at time *t* can be represented as

$$\upsilon\_{\mathfrak{x}}(t) = \upsilon(t) \cdot \cos(\mathfrak{w}\_0 t + \theta\_0),\tag{14a}$$

$$
\upsilon\_y(t) = \upsilon(t) \cdot \sin(w\_0 t + \theta\_0),
\tag{14b}
$$

where the velocity at time *t* is *v*(*t*) = *a*0*t* + *v*0. The predicted trajectory based on the kinematic model can be obtained by integrating the vehicle velocities,

$$\begin{cases} \mathbf{x}(t) = \frac{a\_0}{w\_0^2} \cos(\theta(t)) + \frac{v(t)}{w\_0} \sin(\theta(t)) + \mathbf{c}\_{\mathbf{x}},\\ \mathbf{y}(t) = \frac{a\_0}{w\_0^2} \sin(\theta(t)) - \frac{v(t)}{w\_0} \cos(\theta(t)) + \mathbf{c}\_{\mathbf{y}}. \end{cases} \tag{15}$$

where *cx* and *cy* are two parameters determined by the initial state and can be expressed as

$$x\_{\!x} = x\_0 - \frac{v\_0}{w\_0} \cos(\theta\_0) - \frac{a\_0}{w\_0^2} \sin(\theta\_0) \,, \tag{16a}$$

$$\mathcal{L}\_{\mathcal{Y}} = y\_0 + \frac{\upsilon\_0}{w\_0} \sin(\theta\_0) - \frac{a\_0}{w\_0^2} \cos(\theta\_0) \,. \tag{16b}$$

In particular, when the initial yaw rate *w*<sup>0</sup> = 0, the predicted trajectory changes to

$$\begin{cases} \text{tr}\boldsymbol{\mu}\_{\text{mdl}} : \begin{cases} \mathbf{x}(t) = (\frac{1}{2} \cdot \boldsymbol{a}\_0 \cdot \boldsymbol{t}^2 + \boldsymbol{v}\_0) \cos(\theta\_0) + \mathbf{x}\_0 \, \text{s} \\ \mathbf{y}(t) = (\frac{1}{2} \cdot \boldsymbol{a}\_0 \cdot \boldsymbol{t}^2 + \boldsymbol{v}\_0) \sin(\theta\_0) + \mathbf{y}\_0 \, \text{s} \end{cases} \end{cases} \tag{17}$$

2.2.3. Trajectory Prediction Based on Both Driver's Intention and Vehicle Kinematics Model

Since the predicted trajectory based on the kinematics model is more accurate only in a shorter prediction time, and the trajectory based on the driver's intention has higher accuracy in a longer period of time, the predicted trajectory obtained by combining the two will be more accurate. Let the coefficient of the predicted trajectory based on the kinematics model be w(*t*), and the predicted trajectory model can be changed to

$$
tau\_i(t) = \mathbf{w}(t) t r a j\_{mll}(t) + (1 - \mathbf{w}(t)) t r a j\_{mun}(t) \, , \tag{18}
$$

where w(*t*) ∈ [0, 1] is a time-varying variable that is designed to predict trajectory with high accuracy. Here, w(*t*) is designed and tuned with multiple simulations as

$$\mathbf{w}(t) = \begin{cases} \frac{2}{\mathcal{D}} t^3 - \frac{1}{3} t^2 + 1 & \text{for } 0 \le t < 3\\ 0 & \text{for } t \ge 3 \end{cases} \tag{19}$$

Combined with the driver's intention and the vehicle kinematics model, a more accurate predicted trajectory can be obtained. The results of trajectory prediction are shown in Figure 3. The trajectory that is predicted based on both operation intention and the vehicle kinematics model is more accurate than the other two methods.

**Figure 3.** The results of trajectory prediction based on operation intention, vehicle kinematics model, and both operation intention and vehicle kinematics model.

#### **3. Behavioral Planning**

In the behavioral planning, by comprehensively considering the host vehicle information, road information, and surrounding environment information, an optimal behavioral trajectory is selected, which will be used for trajectory generation.

#### *3.1. Generation of Candidate Paths*

To reduce the size of the search space for trajectory planning and speed up the calculation, a series of candidate paths need to be generated first. The optimal path that does not collide is selected from the candidate paths. The candidate path is a candidate set that can represent the behavior that the vehicle can take in the planning process, and the road space of the entire prediction time is reasonably divided into multiple road segments. Comprehensively considering the prediction length and calculation time, the road space in the prediction time is divided into three road segments. The candidate points at the same *s* coordinates are called a layer of candidate point sets. The diagram of this division is shown in Figure 4. The road space occupied by each road segment can be represented as

$$s\_i = \begin{cases} N\_i \cdot \Delta \mathbf{s} & \text{for } \mathbf{v} > \mathbf{v}\_{\min} \\ N\_i \cdot \Delta \mathbf{s}\_{\min} & \text{for } \mathbf{v} \le \mathbf{v}\_{\min} \end{cases} \tag{20}$$

where Δ*s* refers to the distance traveled by the vehicle in 1 s, *si* refers to the length of the *ith* road space. To prevent the predicted length from being too short, a minimum interval Δ*smin* needs to be set so that the vehicle can plan the trajectory even if the current vehicle speed is too slow. The value of *Ni* is set based on the comprehensive consideration of the calculation time and the predicted length. The step length near the current position is short and the one farther away from the current position is large.

**Figure 4.** The diagram of Generation of Candidate Paths.

All candidate roads together constitute a candidate set of trajectories in the future. Each layer of candidate point sets contains the center point of both the current lane and the adjacent lane and the lane change point of the two lanes. By connecting the points in the candidate point set, a series of path candidate sets can be generated. To further narrow the search range, considering that the vehicle has a high risk of completing the lane change operation in a short period of time, the set of candidate points of the first layer contains only the road center point of the current lane and the lane change point with the adjacent lane.

The path candidate set is a connection of a series of points in the search space, but not every road can be driven in the path candidate set. There will inevitably be static and dynamic obstacles on the road. Regardless of the dynamic obstacles, the paths that pass through the static obstacles are firstly removed. When driving along these paths, the vehicle will inevitably collide with a static obstacle at any time. After removing, all paths in the remaining set of paths candidates will become candidate paths in the behavior decision layer. Speed planning is needed for these candidate paths to select the optimal one.

#### *3.2. Speed Profile*

The selection of vehicle speed needs comprehensive consideration of traffic and road information and restrictions, vehicle restrictions, dynamic obstacles, and other information. Traffic road information mainly includes traffic lights, traffic signs, stop lines, maximum and minimum speed limits, etc, which is simply shown in Figure 5. When selecting the optimal speed sequence, traffic road information must be extracted as the first constraint. Since the influence of road traffic information on vehicle speed mainly acts on the road driving direction, that is, the *S* direction, the limit curve based on the traffic road information on the vehicle speed can be represented in a *v*-*s* profile.

**Figure 5.** Maximum velocity considering traffic information and lateral acceleration.

The maximum lateral acceleration of the vehicle while driving needs to consider the physical limitations of the vehicle and the impact on comfort. Based on the maximum lateral acceleration and the road curvature, the speed limit based on the lateral acceleration can be calculated as

$$w\_{a\_{y,max}} = \sqrt{\frac{a\_{y,max}}{\mathbb{C}\_{Lame}}}.\tag{21}$$

Since the road curvatures *CLane* of adjacent lanes are the same, the limitation of the maximum lateral acceleration on the vehicle speed still only exists in the *S* direction. By comparing the value of the above two speeds, the limit curve of the vehicle speed in the *S* direction can be obtained as shown in Figure 5.

For dynamic obstacles, if time and space are considered at the same time, the problem of speed selection is a problem of optimal selection in *S*-*L*-*T* three-dimensional space, which is extremely high in calculation complexity. To reduce the dimension, each of the above candidate paths can be subjected to speed planning once. The coordinates *L* can be ignored to reduce the difficulty of calculation. An *S*-*T* profile is used to plan speed.

In an *S*-*T* profile, the *T* axis represents the time axis for predicting the future along the candidate road, and the *S* axis represents the space axis extending from the origin along the candidate road. We assume that the dynamic obstacles have constant acceleration within the prediction. The information on dynamic obstacles can be displayed in blue blocks and a safe distance is reserved too. The vehicle needs to travel in the space-time area corresponding to the blank grid. The origin of the *S*-*T* map is the current position of the host vehicle, and how to get to the target *S* position is the goal of speed planning.

Some restrictions and objective functions are necessary. First, the speed of the vehicle should be as fast as possible, which is calculated as

$$f\_{\upsilon} = \left| \upsilon\_{\max}(s\_i) - \upsilon\_i \right|. \tag{22}$$

In addition, large acceleration will reduce driving comfort, so acceleration needs to be limited as

$$f\_a = |a\_i| = \frac{|v\_i - v\_{i-1}|}{\Delta t}.\tag{23}$$

Finally, it is necessary to avoid frequent acceleration and deceleration of the vehicle during driving, which is

$$f\_{jirk} = \frac{|a\_i - a\_{i-1}|}{\Delta t} = \frac{|v\_i - 2v\_{i-1} + v\_{i-2}|}{\Delta t^2}.\tag{24}$$

In summary, the cost function is

$$f\_{spend} = w\_v f\_v + w\_a f\_a + w\_j f\_{j 
circ \,\,\,\,} \tag{25}$$

where *wv*, *wa* and *wj* are coefficients corresponding to *fv*, *fa* and *fjerk*. The optimal speed sequence for a certain behavior can be obtained through the cost function as shown in Figure 6.

**Figure 6.** Speed profile.

*3.3. Optimal Behavioral Trajectory Selection*

Each candidate trajectory in the behavior planning candidate set represents a behavioral operation, and it is especially important to select the optimal one. It is necessary to consider efficiency, comfort, energy consumption, and other aspects.

First, the number of lane changes needs to be considered. *CLC* is a cost function factor affected by the number of lane changes. Since lane changing greatly increases energy consumption and greatly reduces comfort, it is necessary to avoid unnecessary and frequent lane changing operations. Thus,

$$\mathcal{C}\_{L\mathcal{C}} = \sum\_{k=1}^{N\_{lyrr}} \left| Lame(n\_k) - Lame(n\_{k-1}) \right| \,, \tag{26}$$

where *Nlayer* represents the layer number, *nk* represents the *kth* candidate point. *Lane*(*n*(·)) represents the lane at *n*(·) candidate point.

In addition, since the *S* coordinate of the endpoint of each candidate route is the same, the shorter the travel time, the higher the efficiency. *CT* = *T* is only determined by the time. Frequent changes in behavior can reduce comfort and increase control difficulty. To reduce unnecessary changes in behavior planning, a consistency coefficient is introduced to indicate the difference between the candidate behaviors at the current time and the previously executed behavior. Thus, *Ccon* can be formulated as

$$\mathcal{C}\_{\text{conv}} = \sum\_{j=1}^{N\_b - k} \left[ (S\_t^j - S\_{t - \Delta t}^{j+k})^2 + (L\_t^j - L\_{t - \Delta t}^{j+k})^2 \right]. \tag{27}$$

When generating the speed profile in the previous step, the candidate trajectory has been discretized. The step size after discretization is Δ*tST*, and the total number of discrete points is *Nb* = *T*/Δ*tST* + 1. Δ*tre* is the discrete step size of the model used for resampling performed. Then, *<sup>k</sup>* <sup>=</sup> <sup>Δ</sup>*tre*/Δ*tST*. Therefore, *<sup>S</sup><sup>j</sup> <sup>t</sup>* and *<sup>L</sup><sup>j</sup> <sup>t</sup>* represent the coordinate values of the *jth* point on the candidate trajectory in Frenet coordinate system at time *<sup>t</sup>*. *<sup>S</sup>j*+*<sup>k</sup> <sup>t</sup>*−Δ*<sup>t</sup>* and *<sup>L</sup>j*+*<sup>k</sup> <sup>t</sup>*−Δ*<sup>t</sup>* represent the coordinate values of the *jth* point on the candidate trajectory in Frenet coordinate system at time *t*, which are the same corresponding point with *S<sup>j</sup> <sup>t</sup>* and *<sup>L</sup><sup>j</sup> <sup>t</sup>*. In this way, by constraining the difference between the same corresponding points in the two behavior planning paths taken at adjacent moments, the consistency and continuity of the behavior planning path can be constrained.

Last but not least, to make the vehicle location in the center of the road as much as possible while lane-keeping, *Cboun* = ∑ *nboun* is used to represent the number of nodes where the vehicle is on the road boundary in the planned behavior trajectory, where *nboun* represents nodes where the vehicle is at the road boundary in the behavior trajectory.

In summary, a behavior path that is optimal in terms of efficiency, comfort, and energy consumption is selected according to the following cost function, and this behavior path is used as a reference for motion planning.

$$\mathbf{C}(t) = w\_{L\mathbf{C}}\mathbf{C}\_{L\mathbf{C}}(t) + w\_{T}\mathbf{C}\_{T}(t) + w\_{\text{con}}\mathbf{C}\_{\text{con}}(t) \,. \tag{28}$$

#### *3.4. Resampled Behavioral Trajectory*

The step length of the behavior trajectory is longer. After selecting an optimal behavior trajectory, the trajectory needs to be converted from the Frenet coordinate system to the Cartesian coordinate system. The diagram of this process is shown in Figure 7. The step length of the candidate trajectory is Δ*tST*, and the time sequence is Δ*t* 0 *ST*, Δ*t* 1 *ST*, . . . . The step size used in resampling is Δ*tre*, and the time sequence is Δ*t* 0 *re*, Δ*t* 1 *re*, . . . . Since Δ*tST* is larger than Δ*tre*, the optimal behavior trajectory needs to be interpolated to shorten the step size. In the process of interpolating, it is assumed that the vehicle speed remains unchanged within Δ*tST*, and the number of Δ*tre* contained in each Δ*tST* is *N* = Δ*tST*/Δ*tre*. The time sequence of resampled trajectory can be written as Δ*t* 0 *ST*, Δ*t* 01 *re* ,. . . , Δ*t* <sup>0</sup>(*N*−1) *re* , <sup>Δ</sup>*<sup>t</sup>* 1 *ST*, Δ*t* 11 *re* ,. . . , where Δ*t i ST* represents the time at the *ith* step, and Δ*t ij re* represents the *jth* resampled point in the *ith* step.

**Figure 7.** Resampled behavioral trajectory.

#### **4. Trajectory Generation**

With selected behavior from the behavioral planning module, nonlinear model predictive control is used in the trajectory generation module, which considers variational longitudinal velocity and dynamic vehicle model, and uses the C/GMRES algorithm to speed up the calculation process.

#### *4.1. Vehicle Dynamic Model*

To describe the vehicle dynamics characteristics more accurately, this paper uses a vehicle dynamics model. For the vehicle system, the coordinate system is the right-handed coordinate system, and the origin is the position of the center of mass of the vehicle. According to Newton's second law, the dynamic characteristics can be represented as

$$\mathcal{Z}(F\_{yf} + F\_{yr}) = m \cdot a\_{y\text{ \textquotedblleft}g\text{\textquotedblright}g\text{\textquotedblright}g\text{\textquotedblright}g\text{\textquotedblright}g\text{\textquotedblleft}g\text{\textquotedblright}g\text{\textquotedblright}g\text{\textquotedblright}g\text{\textquotedblleft}g\text{\textquotedblright}g\text{\textquotedblright}g\text{\textquotedblleft}g\text{\textquotedblright}g\text{\textquotedblright}g\text{\textquotedblleft}g\text{\textquotedblright}g\text{\textquotedblright}g\text{\textquotedblleft}g\text{\textquotedblright}g\text{\textquotedblright}g\text{\textquotedblleft}g\text{\textquotedblright}g\text{\textquotedblleft}g\text{\textquotedblright}g\text{\textquotedbl}g\text{\textquotedbl}g\text{\textquotedbl}g\text{\textquotedbl}g\text{\textquotedbl}g\text{\textquotedbl}g\text{\textquotedbl}g\text{\textquotedbl}g\text{\textquotedbl}g\text{\textquotedbl}$$

*w*˙ = 2 *Fy f* · *lf* − *Fyr* · *lr Iz* , (29b)

where *m* is the mass of the vehicle and *Iz* is the inertia of the vehicle. *Fy f* and *Fyr* are the lateral forces of the front wheel and rear wheel, respectively. *ay* is the lateral acceleration. *lf* and *lr* are the lengths from the center of mass to the front axle and the rear axle, respectively. *w* is the yaw rate. *β* is used to represent the ratio of lateral speed to the longitudinal speed,

$$
\beta = \frac{v\_y}{v\_x}.\tag{30}
$$

From the geometric relationship, the slip angles of the front and rear wheels can be represented as

$$a\_f = \arctan(\frac{v\_Y + l\_f w}{v\_X}) \approx -\delta + \frac{w l\_f}{v\_X} + \beta\_f \tag{31a}$$

$$u\_r = \arctan(\frac{v\_y - l\_r w}{v\_x}) \approx \beta - \frac{w l\_r}{v\_x} + \beta \,. \tag{31b}$$

where *δ* is the steering angle. When the lip angle is small, the tire characteristics can be regarded as linear, which is calculated as

$$F\_{yf} = k\_f \mathbb{1}\_{yf} \, , \tag{32a}$$

$$F\_{yr} = k\_r \alpha\_{yr} \,. \tag{32b}$$

The derivative of longitudinal acceleration is the acceleration of the vehicle.

Using *X* = [*x*, *y*, *φ*, *vx*, *vy*, *w*] *<sup>T</sup>*as the state vector of the vehicle dynamics system and *U* = [*a*, *δ*] *<sup>T</sup>* as the input vector of the vehicle dynamics system, the state equation of the vehicle dynamics system is

$$X = f(X, \mathcal{U}) = \begin{bmatrix} v\_x \cos \varrho - v\_y \sin \varrho \\ v\_x \sin \varrho + v\_y \cos \varrho \\ w \\ a \\ \frac{2k\_f(v\_f + wl\_f - \delta v\_x) + 2k\_r(v\_y - wl\_r)}{mv\_x} - v\_x w \\ \frac{2l\_r k\_f(v\_y + wl\_f - \delta v\_x) - 2l\_r k\_r(v\_y - wl\_r)}{l\_z v\_x} \end{bmatrix}. \tag{33}$$

Transform the state equation of the continuous form vehicle dynamics system into a discrete form

$$X\_{k+1} = X\_k + f(X\_k, \mathcal{U}\_k) \Delta t \,. \tag{34}$$

#### *4.2. Controller Design*

To complete the task of trajectory planning, it is necessary to rationally design the objective functions and constraints of model predictive control. The input value of the vehicle system cannot exceed the limit of the physical structure, so the upper and lower limits of the input variable need to be limited as


$$|\delta\_k| \le \delta\_{\max} \,. \tag{35b}$$

To ensure that the planned trajectory does not collide with obstacles, it is necessary to maintain a certain safety distance between the trajectory at each moment and the obstacle at the corresponding moment. Since the longitudinal speed of the vehicle is often much higher than the lateral speed, a larger safe distance is needed in the longitudinal direction than in the lateral direction. The safety range around the host autonomous vehicle is designed as an ellipse, where the long axis of the ellipse is the longitudinal direction so that the safety constraint can be expressed as

$$(\frac{\mathbf{x\_{host}} - \mathbf{x\_{obs}}}{r\_{\mathbf{x}}})^2 + (\frac{y\_{host} - y\_{obs}}{r\_{\mathbf{y}}})^2 \ge 1,\tag{36}$$

where *rx* and *ry* represent safety distance in the longitudinal direction and the lateral direction, respectively. The larger the safety distance is, the further the host vehicle acts.

To make the final trajectory planning result close to the behavioral planning path, the following objective function is needed

$$J\_{\rm conv} = w\_1 (\mathbf{x}\_k - \mathbf{x}\_{\rm ref})^2 + w\_2 (y\_k - y\_{\rm ref,k})^2,\tag{37}$$

where *xk* and *yk* are the coordinates of the trajectory at time-step *k*. *xref* ,*<sup>k</sup>* and *yref* ,*<sup>k</sup>* are the coordinates of the resampling behavioral planning path at time-step *k*.

As mentioned before, excessive acceleration and a small turning radius will greatly reduce driving comfort, and so it's necessary to control their value of them.

$$J\_{\rm com} = w\_3 a\_k^2 + w\_4 \delta\_k^2 \,. \tag{38}$$

A terminal constraint is added to ensure the final trajectory matches the planned behavioral path better.

$$J\_{cnd} = \frac{1}{2} (X\_N - X\_{ref,N})^T S\_f (X\_N - X\_{ref,N}) \,. \tag{39}$$

When there are two environment vehicles, the expression of the controller for motion planning is

$$\min\_{\lambda\_{M \times N}} I\_{l\mathcal{U}\_{i-N}} = \frac{1}{2} (\mathbf{X}\_N - \mathbf{X}\_{\text{ref},N})^T \mathbf{S}\_f (\mathbf{X}\_N - \mathbf{X}\_{\text{ref},N}) + \sum\_{k=1}^N \left( w\_1 (\mathbf{x}\_k - \mathbf{x}\_{\text{ref},k})^2 \right)$$

$$+ w\_2 (y\_k - y\_{\text{ref},k})^2 + w\_3 a\_k^2 + w\_4 \delta\_k^2 \Big) \Delta t \tag{40a}$$

$$\text{s.t.}\,\dot{X}\_{k+1} = X\_k + f(X\_{k\prime} \,\mathcal{U}\_k)\Delta t,\tag{40b}$$

$$|a| \le a\_{\max\prime} |\delta| \le \delta\_{\max\prime} \tag{40c}$$

$$(\frac{\mathbf{x}\_k - \mathbf{x}\_{obs,k}^l}{r\_x})^2 + (\frac{y\_i - y\_{obs,k}^l}{r\_y})^2 \ge 1, j = 1, 2. \tag{40d}$$

Since the vehicle dynamics model used is a non-linear model, a suitable solving algorithm is necessary. In this paper, a Continuation/GMRES algorithm is used to solve the nonlinear model predictive control problem.

#### *4.3. C-GMRES*

In the nonlinear model predictive control problem, the small sampling period of mechanical systems will bring a great burden to the computing platform. Based on Generalized Minimum Residual Method (GMRES), Ohtsuka introduced the concept of the Continuation Generalized Minimum Residual Method (C/GMRES), which solves the linear equations involved in the differential equations at each sampling instant, thereby solving the control input sequence [26]. The detailed C/GMRES algorithm that is used in this paper is depicted in Algorithm 1.

#### **Algorithm 1** C/GMRES Algorithm

*//Initialize t* = 0*, l* = 0*, initial state x*<sup>0</sup> = *x*(0) *and find U*<sup>0</sup> *analytically or numerically such that* ||*F*(*U*0, *x*0, 0)|| ≤ *δ for some positive δ, maximum iteration number kmax.* 1. For *t* ∈ [*t*, *t* + Δ*t*], compute the real control input by *u*(*t* ) = *P*0(*Ul*). 2. At next sampling instant *t* + Δ*t*, measure the state *xl*<sup>+</sup><sup>1</sup> = *x*(*t* + Δ*t*), set Δ*xl* = *xl*<sup>+</sup><sup>1</sup> − *xl*. 3. *U*˙ *<sup>l</sup>* <sup>=</sup> *FDGMRES*(*Ul*, *xl*, <sup>Δ</sup>*xl*/Δ*t*, *<sup>t</sup>*, <sup>ˆ</sup> *<sup>U</sup>*˙ *<sup>l</sup>*, *<sup>h</sup>*, *kmax*), where <sup>ˆ</sup> *<sup>U</sup>*˙ *<sup>l</sup>* <sup>=</sup> <sup>ˆ</sup> *<sup>U</sup>*˙ *<sup>l</sup>*−<sup>1</sup> with <sup>ˆ</sup> *<sup>U</sup>*˙ <sup>−</sup><sup>1</sup> <sup>=</sup> 0. 4. *Ul*<sup>+</sup><sup>1</sup> = *Uk* + *U*ˆ *<sup>k</sup>* + Δ*t*. 5. Update *t* = *t* + Δ*t*, *k* = *k* + 1.

To solve the trajectory planning problem, first, the dummy inputs are used to convert inequality constraints into equality constraints.

$$(\delta\_k^2 + \mu\_{d1,k}^2 - \delta\_{\text{max}}^2)/2 = 0,\tag{41a}$$

$$(a\_k^2 + u\_{d2,k}^2 - a\_{max}^2)/2 = 0,\tag{41b}$$

$$(\frac{x\_k - x\_{obs,k}^1}{r\_x})^2 + (\frac{y\_k - y\_{obs,k}^1}{r\_y})^2 - 1 - u\_{d3,i}^2 = 0 \,\,\,\tag{41c}$$

$$(\frac{\mathbf{x}\_k - \mathbf{x}\_{obs,k}^2}{r\_x})^2 + (\frac{y\_k - y\_{obs,k}^2}{r\_y})^2 - 1 - \mathbf{u}\_{d4,i}^2 = \mathbf{0}.\tag{41d}$$

Meanwhile, to prevent multiple solutions of dummy inputs, adding a small dummy penalty to the objective function

$$\begin{aligned} \min\_{\lambda\_{M \times N}} J\_{l L\_i - N} &= \frac{1}{2} (X\_N - X\_{ref, N})^T S\_f (X\_N - X\_{ref, N}) + \sum\_{k=1}^N \left( w\_1 (\mathbf{x}\_k - \mathbf{x}\_{ref, k})^2 \right) \\ &+ w\_2 (y\_k - y\_{ref, k})^2 + w\_3 a\_k^2 + w\_4 \delta\_k^2 - w\_{l L\_d} \mathbf{L}\_{d, k} \Big) \Delta t \end{aligned}$$

where *wUd* is a small positive constant.

Then, define the Hamiltonian by

$$H(\mathbf{x}, \boldsymbol{\lambda}, \boldsymbol{\mu}, \boldsymbol{\mu}, \boldsymbol{p}) = L(\mathbf{x}, \boldsymbol{\mu}, \boldsymbol{p}) + \boldsymbol{\lambda}^T f(\mathbf{x}, \boldsymbol{\mu}, \boldsymbol{p}) + \boldsymbol{\mu}^T \mathbb{C}(\mathbf{x}, \boldsymbol{\mu}, \boldsymbol{p}), \tag{42}$$

where *<sup>λ</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* represents costate and *<sup>μ</sup>* <sup>∈</sup> <sup>R</sup>*mc* represents language multiplier. *mc* represents the dimension of constraints.

For an optimal control {*u*<sup>∗</sup> *<sup>k</sup>* }*N*−<sup>1</sup> *<sup>k</sup>*=<sup>0</sup> , it exists {*λ*<sup>∗</sup> *<sup>k</sup>* }*<sup>N</sup> <sup>k</sup>*=<sup>0</sup> and {*μ*<sup>∗</sup> *<sup>k</sup>* }*N*−<sup>1</sup> *<sup>k</sup>*=<sup>0</sup> , the following conditions should be satified

$$\mathbf{x}\_{k+1}^{\*} = \mathbf{x}\_{k}^{\*} + f(\mathbf{x}\_{k'}^{\*}, \mathbf{u}\_{k'}^{\*} | p\_{k}) \Delta t\_{\prime} \tag{43a}$$

$$
\lambda\_k^\* = \lambda\_{k+1}^\* + H\_\mathbf{x}^T (\mathbf{x}\_{k'}^\* \lambda\_{k+1'}^\* \boldsymbol{u}\_{k'}^\* \boldsymbol{\mu}\_{k'}^\* \boldsymbol{p}\_k) \Delta t\_\prime \tag{43b}
$$

$$
\lambda\_N^\* = \varphi\_x^\Gamma(\mathfrak{x}\_{N'}^\* p\_N),
\tag{43c}
$$

$$\mathbf{x}\_k^\* = \mathbf{x}(t\_0),\tag{43d}$$

$$H\_{\mu}(\mathbf{x}\_{k'}^\* \lambda\_{k+1'}^\* \boldsymbol{\mu}\_{k'}^\* \boldsymbol{\mu}\_{k'}^\* \boldsymbol{p}\_k) = 0,\tag{43e}$$

$$\mathbb{C}(\mathbf{x}\_{k'}^\*, \mu\_{k'}^\* p\_k) = 0.\tag{43f}$$

Here, Equaton (43) can be summarized as

$$F(lI(t), \mathbf{x}(t), t) = \begin{bmatrix} H\_{\mu}^{T}(\mathbf{x}\_{0}^{\*}, \lambda\_{1}^{\*}, \mu\_{0}^{\*}, \mu\_{0}^{\*}, p\_{0}) \\ \mathbb{C}(\mathbf{x}\_{0}^{\*}, \mu\_{0}^{\*}, p\_{0}) \\ \dots \\ H\_{\mu}^{T}(\mathbf{x}\_{N-1}^{\*}, \lambda\_{N}^{\*}, \mu\_{N-1}^{\*}, \mu\_{N-1}^{\*}, p\_{N-1}) \\ \mathbb{C}(\mathbf{x}\_{N-1}^{\*}, \mu\_{N-1}^{\*}, p\_{N-1}) \\ = 0 \end{bmatrix},\tag{44}$$

Then, *F*˙(*U*, *x*, *t*) can be expressed as

$$
\dot{F}(\mathcal{U}, \mathbf{x}, \mathbf{t}) = A\_s F(\mathcal{U}, \mathbf{x}, \mathbf{t}) \,. \tag{45}
$$

Here, *As* is an introduced stable matrix that stabilizes *F*(*U*, *x*, *t*) at the origin. Then, *U*˙ can be computed with

$$
\dot{M} = F\_{\text{II}}^{-1} \left( A\_s F - F\_\mathbf{x} \dot{\mathbf{x}} - F\_t \right) \,. \tag{46}
$$

The solution curve *U*(*t*) is approximated by forward difference if an initial solution *U*(0) satisfying *F*(*U*(0), *x*(0), 0) = 0 can be found. Here, generalized minimal residual (GMRES) method is applied to solve the linear equation *FUU*˙ <sup>=</sup> *AsF* <sup>−</sup> *Fx <sup>x</sup>*˙ <sup>−</sup> *Ft*. The combination of forward difference approximation and GMRES is called FDGMRES.

#### **5. Simulation**

In this section, the proposed computational efficient motion planning method is verified in three different environments. According to the information of the obstacle and the predicted trajectory, the behavior is selected, and the trajectory is optimized using NMPC, which is fast solved by the C/GMRES algorithm.

#### *5.1. Obstacle Avoidance on Straight Lane*

As shown in Figure 8, the host vehicle travels on a straight lane with an initial speed of 10 m/s. The maximum speed limit of the road is 15 m/s and each lane width is 4 m. An obstacle vehicle is 30 m ahead of the host vehicle at speed of 5 m/s. The trajectory of the host vehicle and the obstacle vehicle is shown in Figure 8a. The speed profile of two vehicles and the steering wheel angle of the host vehicle is shown in Figure 8b,c. In this driving scenario, the host vehicle executes two consecutive lane-changing operations smoothly and quickly to avoid dynamic obstacles and keep the speed under the maximum speed.

The processor of the computer is Intel (R) Core (TM) i7-6700hq CPU@ 2.60GHZ. The time step size Δ*t* for simulation is 0.05s and the nonlinear model predictive control problem using the C/GMRES algorithm, which is also compared with fmincon function in MATLAB. As shown in Figure 9, the calculating time of solving is much less than the time step size Δ*t*, and also much less than the calculating time of the fmincon function in MATLAB which is more than 10 min. It shows that the local path planning module solved by C/GMRES can better meet the requirements of solving speed.

**Figure 9.** Calculating time.

#### *5.2. Obstacle Avoidance on Winding Lane*

In the scenario of a winding road, the center line of the initial lane is *y* = 10−6*x*<sup>3</sup> + 10−5*x*<sup>2</sup> The obstacle vehicle travels at a speed of 5 m/s at 50 m in front of the host vehicle. The trajectory and speed profile of the two vehicles and the steering wheel angle of the host vehicle is shown in Figure 10. The host vehicle chooses to accelerate to overtake the preceding vehicle and avoid the obstacle.

#### *5.3. Lane-Changing Obstacle Avoidance*

A much more complex scenario is verified in the third simulation, in which the intention of the obstacle changed. The trajectory of the obstacle vehicle is selected from the open data set, which executes a lane change to the left lane. The trajectory and speed profile of the two vehicles and the steering wheel angle of the host vehicle is shown in Figure 11.

**Figure 10.** Simulation results of obstacle avoidance on winding lane. Here, (**a**) is vehicle trajectory. (**b**) is vehicle velocity. (**c**) is steering wheel angle.

**Figure 11.** Simulation results of lane-changing obstacle avoidance. Here, (**a**) is vehicle trajectory. (**b**) is vehicle velocity. (**c**) is steering wheel angle.

As shown in Figure 11a, at first, the obstacle vehicle chooses lane-keeping before changing lanes to the left lane. In this process, the host vehicle tries to change lanes. After the obstacle vehicle decides to change to the target lane of the host vehicle, the host vehicle decides to change back to its original lane. From the above simulation results, the proposed motion planning method considers safety, computational efficiency, and comfort simultaneously and obtains good system performance.

#### **6. Conclusions**

This paper proposes a computationally efficient motion planning method for autonomous vehicles, which considers dynamic obstacle avoidance and traffic interaction. The decision process for complex behavior is reasonably simplified in both time and space span. Different points located on unequally divided road segments and lane centerlines are connected to represent behavior. And C/GMRES algorithm is used to accelerate the calculation of the NMPC problem in the trajectory generation module. The trajectories of other traffic participants are more accurately predicted with known intention and vehicle models, which enables the movement to be more reasonably planned. Finally, three

groups of simulation experiments are carried out to verify the rationality and superiority of the algorithm.

In future works, the interactive intention prediction will be considered in the intention predictive layer, which can extend the motion planning method from reacting adaptively to predicting adaptively. By considering the interactions between the ego vehicle and surrounding drivers socially via implicit and/or explicit communications, the behavior of the autonomous vehicle can be more human-like and facilitate safety performance under complex and dynamic environments [27].

**Author Contributions:** Conceptualization, B.G. and H.C.; methodology, J.L. and Y.Z.; validation, J.L. and J.W.; investigation, X.N.; writing—original draft preparation, J.L. and Y.Z.; writing—review and editing, Y.Z. and J.W.; supervision, H.C.; funding acquisition, B.G. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported in part by the International Technology Cooperation Program of Science and Technology Commission of Shanghai Municipality (21160710600), in part by the Shanghai Pujiang Program (21PJD075), in part by the China Automobile Industry Innovation and Development Joint Fund (U1864206).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy reason.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Driver Assisted Lane Keeping with Conflict Management Using Robust Sliding Mode Controller**

**Gabriele Perozzi 1, Mohamed Radjeb Oudainia 1, Chouki Sentouh 1, Jean-Christophe Popieul <sup>1</sup> and Jagat Jyoti Rath 2,\***


**Abstract:** Lane-keeping assistance design for road vehicles is a multi-objective design problem that needs to simultaneously maintain lane tracking, ensure driver comfort, provide vehicle stability, and minimize conflict between the driver and the autonomous controller. In this work, a cooperative control strategy is proposed for lane-keeping keeping by integrating driving monitoring, variable level of assistance allocation, and human-in-the-loop control. In the first stage, a time-varying physical driver loading pattern is identified based on a relationship between lateral acceleration, road curvature, and the measured maximum driver torque. Together with the monitored driver state that indicates driver mental loading, an adaptive driver activity function is then formulated that replicates the levels of assistance required for the driver in the next stage. To smoothly transition authority between various modes (from manual to autonomous and vice versa) based on the generated levels of assistance, a novel higher-order sliding mode controller is proposed and closed-loop stability is established. Further, a novel sharing parameter (which is proportional to the torques coming from the driver and from the autonomous controller) is used to minimize the conflict. Experimental results on the SHERPA high-fidelity vehicle simulator show the real-time implementation feasibility. Extensive experimental results provided on the Satory test track show improvement in cooperative driving quality by 9.4%, reduction in steering workload by 86.13%, and reduced conflict by 65.38% when compared with the existing design (no sharing parameter). These results on the cooperative performance highlight the significance of the proposed controller for various road transportation challenges.

**Keywords:** human-machine shared control; lane keeping assistance; higher order sliding mode; conflict minimization; ADAS; driver assist system

#### **1. Introduction**

Advanced driver assist systems (ADASs; acronyms of this manuscript are defined in the Acronyms section) such as lane keeping assist (LKA), adaptive cruise control (ACC), and collision avoidance (CA) systems have been widely employed in commercial vehicles. These systems greatly reduce the workload of human drivers and reduce the risk of accidents, and crashes by warning or supporting the driver for particular maneuvers [1]. The ADASs developed for semi-autonomous driving scenarios can be categorized into human-guided, human-supervised, and human-assisted architectures [2]. In recent works, it has been established that driver-in-the-loop (DiL) human-assisted ADAS architectures can be employed to address various human–machine interaction (HMI) challenges inclusive of authority allocation [3], the transition of authority [4], conflict management [5], and human driver workload reduction and skill enhancement [6]. Such cooperative driving architectures have been explored for adaptive cruise control, collision avoidance systems, and lane departure/keeping systems among others [7,8]. To design cooperative control architectures for ADAS, DiL architectures are typically formulated by integrating driver

**Citation:** Perozzi, G.; Oudainia, M.R.; Sentouh, C.; Popieul, J.-C.; Rath, J.J. Driver Assisted Lane Keeping with Conflict Management Using Robust Sliding Mode Controller. *Sensors* **2023**, *23*, 4. https://doi.org/10.3390/s23010004

Academic Editors: Yafei Wang, Chao Huang, Peng Hang, Zhiqiang Zuo and Bo Leng

Received: 17 November 2022 Revised: 13 December 2022 Accepted: 16 December 2022 Published: 20 December 2022

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

attributes such as workload, experience, and skill in the control design. For effective action which reflects such attributes, various driver models based on neuromuscular dynamics [9], data-driven [10], hand impedance [11], and vision/preview have been developed [5]. In this work, the avenue of cooperative control for lane-keeping assistance (LKA) systems considering the steering input (angle or torque) as a control signal is explored with a focus on HMI management and vehicle positioning error minimization.

#### *1.1. State of the Art*

Many works can be found in the literature dealing with the design of controllers for trajectory following [12]. Among of all the robust controllers, the sliding mode law is worldwide recognized as one of the most effective to reject external matched perturbations [13], so they can be used to reject perturbations that affect road vehicles. The system disturbances and parameter uncertainties introduced by human–machine cooperation driving are also inevitable. Ref. [14] proposed a control method to solve the above problems. Optimization algorithms have also been used to reduce the computational cost of implementing the control law in real-time applications [15]. Active fault-tolerant controllers have been largely used to increase plant availability and reduce the risk of safety hazards, preventing simple faults from developing into serious failure [16,17]. The last decade witnessed a great development of automated driving vehicles and vehicle intelligence. The significant increment of machine intelligence poses a new challenge to the community, which is the collaboration between human drivers and vehicle autonomy. In [18], a literature review was conducted and perspectives on the human behaviors and cognition (HBC) for ADVs toward human-autonomy (H-A) collaboration were proposed.

Various cooperative control architectures have been proposed in [5,7,8,19,20] based on DiL designs. In [21], a driver model using a weighting process of visual guidance from the road ahead and haptic guidance from a steering system for a lane-following task were proposed. In [3,22–24], haptic feedback from the steering wheel was used to ensure both driver and the autonomous controller participated in the driving action. In [25], an extended shared steering control system with an authority adaptive allocation model was proposed to improve the reliability of the shared steering control system, and weaken the influence of uncertain driver behavior on driving safety. Ref. [26] presented a shared control framework based on handling inverse dynamics and driving intention for lane changing, in particular, the influence of the driver's lane-changing start point and end point is considered in the design of the shared controller. In [6], a cooperative control approach for lane keeping based on *H*<sup>2</sup> preview control was proposed by incorporating a neuromuscular driver model. Similarly, in [20], a haptic shared control between driver and ecopilot considered the use of driver torque as haptic feedback to design T-S fuzzy controllers for lane keeping. In [19], for varying driver steering characteristics such as delays, and preview time, a DiL gain-scheduling *H*∞ robust shared controller was proposed. These approaches typically validated the cooperative performance of the DiL design for lanekeeping tasks in presence of driver parameter uncertainty and environmental disturbances such as crosswinds, and road curvature. Although efficient lane-keeping performance under various driving conditions was validated, issues of conflict between human driver and autonomous controller, driver workload management and performance enhancement were not explicitly addressed.

Driver workload typically characterizes the driving action required by the human driver to perform a typical task. Based on monitored cognitive states (mental workload) and physical driving effort (physical workload) applied by the driver, the workload can be categorized into under-load, normal and over-load regions [5,27]. The mental workload of the driver reflects the state of involvement of the driver in the driving task. Typically, driver state of drowsiness [5,28], the intention of driving action [28], and meticulous steering action [29] are employed as indicators of the mental workload. The physical workload of the driver can be determined by monitoring the driver torque/steer input applied, and the steer reversal rate. The objective of a cooperative LKA strategy is then adapting the driver

activity in terms of workload into the controller design for effective management of HMI and keeping vehicles on the lane. In [30], an optimal modulation policy was designed with a cost function, then a nonlinear stochastic model predictive approach was used to solve the cost function subjected to probabilistic uncertainties in human's biomechanics. In [27], the relationship between driver workload and level of assistance required was explored for the design of an LKA controller to improve driver performance. Takagi-Sugeno (T-S) models [31,32] used driver activity functions considering driver state, torque, and intention, which replicate the level of assistance required during a typical task [27].

The conflict between the human driver and the autonomous controller typically occurs when both agents have different actions for the same driving task. Such scenarios arise during the transition of authority between the agents, sudden maneuvers executed by driver/automation which is not predicted by the other agent, and during extreme maneuvers i.e., sharp curve negotiation. In [4,6], based on cooperative status detection, a conflict-free smooth transition of authority between human driver and autonomous controller was proposed. Similarly, in [23], conflict mitigation by adapting the parameters of the controller with respect to individual drivers was proposed. Extending the work of [31], a cooperative control approach employing T-S models was proposed in [5] to perform lane keeping and conflict minimization simultaneously. In [33], a haptic control architecture was developed for the smooth transition of control authority with adaptation to driver cognitive workload. In the works of [6,19,20,31,32], the controllers designed were based on the linear bicycle model which did not account for varying tire friction forces. The works in [6,19] assumed constant longitudinal speed in the design of lane-keeping controllers. Further, conflicts between the driver and the automated driving system were not explicitly addressed in [19,32]. In [5,31,33], by the design of shared control dependent on driver attributes, the issue of conflict between the driver and automated system was addressed for variable longitudinal speeds and fixed longitudinal speeds. However, these works were analyzed for the linear bicycle model that did not consider the aspect of saturated tire friction forces during extreme maneuvers.

#### *1.2. Proposed Methodology*

To account for tire-force non-linearities and environmental disturbances, management of HMI between human drivers and autonomous controller with respect to driver workload, and conflict management, a robust cooperative control approach is proposed in this work. Based on the non-linear representation of tire-friction dynamics [34] integrated with a human driver model developed using visual cues [5], a DiL design is formulated. The HMI between the human driver and the driver torque is then developed based on adaptation to driver workload and subsequent driver performance. For adaptation to driver performance, a non-linear representation of driver activity based on physical and cognitive workload is formulated. For quantifying adaptive physical workload, a rule-based logic is used to explore the relationship between lateral acceleration, predicted road curvature, and maximum driver torque. Based on the developed DiL model dynamics, a novel robust nonlinear feedback controller based on adaptive higher order sliding mode (HOSM) [35,36] is developed for the system. The conflict is managed by the introduction of a sharing parameter, which is a function for driver and assistance torques in the input-dependent sliding surface. The developed feedback control is then modulated using the non-linear function developed on the relationship of driver performance-level of assistance required, for effective HMI management. The closed-loop stability of the time-varying system dynamics involving the non-linear modulating function, DiL dynamics, and environmental disturbances is then established.

#### *1.3. Contribution*

The main contributions of this work are:

• The introduction of a shared control parameter into the control design to minimize conflict between the human driver and automated driving system.


The manuscript is organized as follows. Section 2 introduces the driver–vehicle–lane model. Section 3 focuses on the design of the proposed controller. Extensive discussions about the performance of the proposed approach with regard to lane position error reduction, driver satisfaction, and the influence of the conflict parameter are provided in Section 4.

#### **2. Problem Formulation: Driver Adapted Lane Keeping**

The time-varying dynamics governing a DiL vehicle model in the presence of environmental disturbances for lateral control and the problem of designing a closed-loop controller to manage the HMI between a driver and an autonomous controller are discussed in this section. The symbols of this manuscript are defined in the Nomenclature section.

#### *2.1. DiL Modeling: Vehicle-Road-Driver Dynamics*

The DiL model development is carried out by integrating the vehicle's lateral and yaw motion dynamics with the steering column dynamics, the lane tracking dynamics, and a linear model of the human driver's torque. The governing dynamics for the lateral motion of the vehicle under assumptions of negligible influence of the longitudinal dynamics can be efficiently represented using the non-linear bicycle dynamic model [1,37] as in Equation (2).

$$M\upsilon\_x \beta = F\_{yr} + F\_{yf} \cos(\delta\_f) - M\upsilon\_x \dot{\psi}\_v + F\_w \tag{1}$$

$$I\_z \ddot{\psi}\_{\mathcal{V}} = l\_f F\_{yf} \cos(\delta\_f) - l\_I F\_{yr} + M\_W \tag{2}$$

where *β* is the side slip angle, *δ* is the steering angle, *ψ* is the heading angle, *Fy f* , *Fyr* are the front and rear friction forces, *Fw* is the crosswinds force, and *vx* is the longitudinal velocity. To represent the tire–road friction conditions, several linear, adaptive, uncertain, and nonlinear models like the Brush-Tire (BT) friction model, LuGre friction model among others are employed [38]. Although the nonlinear models represent the dynamic characteristics of tire–road friction, these models are not easily applicable in control approaches due to their highly complex behavior and dynamics. The linear uncertain friction model [39] has been employed in this work for controller development. The lateral tire friction forces and the self-align torque of the steering wheel are then given as in Equations (3) and (4).

$$F\_{\rm yi} = 2C\_{\rm pi} \mathfrak{a}\_i + \Delta F\_{\rm i} \tag{3}$$

$$T\_s = \frac{K\_p t\_p F\_{yf}}{R\_s} \tag{4}$$

with *α<sup>f</sup>* , *α<sup>r</sup>* denoting the front and rear slip angles, *Ts* denoting the self-aligning torque, and Δ*Fi* denoting the lumped uncertain part of the tire friction forces indicative of the effects of changing road conditions, tire pressure variations, saturation, etc., which can be modeled using any of the above-discussed dynamic friction models. The variable *Kp* ∈ (0, 1] is a ratio denoting the level of assistance from the active steering system. In the absence of any active steering support, the value of *Kp* = 1. The front and rear slip angles under small angle assumptions are given as in Equation (6).

$$
\alpha\_f = \delta\_f - \frac{\beta v\_\mathbf{x} + l\_f \dot{\Psi}\_\mathbf{v}}{v\_\mathbf{x}} \tag{5}
$$

$$\alpha\_{l} = \frac{\beta \upsilon\_{\text{x}} - l\_{r} \psi\_{\text{y}}}{\upsilon\_{\text{x}}} \tag{6}$$

Under the small angle assumptions, the above non-linear bicycle model dynamics appropriately represent the vehicle motion for low curvature roads and have been widely employed for shared lateral control [5,7].

The vehicle's lane tracking performance can be modeled using two error variables, *yl* and Ψ*l*, which indicate the lateral deviation error and the orientation error of the vehicle with respect to the lane center-line at a specified look-ahead distance as shown in Figure 1.

**Figure 1.** The nonlinear bicycle model of vehicle.

These lane errors are readily obtained using vision-based sensors from the vehicle perception unit. The dynamics of these error variables are given, as [5], in Equation (8).

$$
\Delta \dot{y}\_l = \beta v\_x + l\_s \psi\_v + \Psi\_l v\_{x\prime} \tag{7}
$$

$$
\Psi\_l = \psi\_v - \rho\_r \upsilon\_x \tag{8}
$$

with *yl*, Ψ*<sup>l</sup>* representing the lateral offset error and the heading error respectively. With the road-vehicle dynamics considered, the interaction between the human driver and the vehicle is then modeled by considering the steering-column dynamics with only basic assist provided [5,7] as in Equation (9).

$$I\_s \ddot{\delta}\_d = T\_d + T\_d - T\_s - B\_u \dot{\delta}\_d \tag{9}$$

where *Td*, *Ta* represent the driver and the automation torques, respectively. Integrating the dynamics (8) and (9), an autonomous controller can be designed to generate the assistance torque *Ta* which can maintain the vehicle on the lane. Further, the consideration of the steering column dynamics also helps in informing the human driver of the external road conditions directly.

#### *2.2. HMI Management: Driver Workload-Level of Assistance*

Driver-adaptive LKA systems intend to provide assistance to human drivers for difficult and adverse scenarios [5,40–42]. Specifically, adaptation techniques are designed such the physical and mental workload of drivers during driving can be easily managed. Using measured vehicle responses such as steering torque, steering wheel reversal rate, and jerk among others, the physical workload of a driver is quantified [5,27,42]. Similarly, based on measured driver responses such as gaze monitoring, drowsiness, and intention to perform a maneuver, the mental workload of a driver can be quantified [5,7,9]. Integrating both these indicators via a nonlinear mapping and relating them to driver performance, various adaptive functions have been proposed by our research group for shared lanekeeping tasks [5,28,42]. On similar lines, we consider the use of normalized driver torque and driver distraction levels as indicators of the driver's physical and cognitive workloads, respectively. The entire procedure is carried out in three steps as shown below:

• *Identification of driver workload*: The measured driver torque at the steering wheel is typically dependent on many factors such as road curvature, lateral acceleration, the preview time, and the far point distance, and dynamics of the human arm among others. In this work, the adaptive driver torque *Tdm* for various drivers/driving scenarios is computed using a simple rule-based logic with the inputs being lateral acceleration and predicted road curvature [43]. With the increase in lateral acceleration and road curvature, the value of the *Tdm* increases, to show more physical workload of the driver. Mathematically, the normalized maximum driver torque is represented in Equation (10).

$$T\_{dn} = \left| T\_d / T\_{dn} \right| \tag{10}$$

Similarly, the mental workload is accounted for by the driver state (*DS* ∈ [0, 1]) which categorizes the driver's involvement into different levels such as attentive, sleepy, drowsy, and distracted. With the increase in *DS* the driver is more involved in the driving task and vice-versa. In the case when *DS* = 0, the driver is completely distracted, and when *DS* = 1, the driver is actively involved in the driving task. For practical purposes, the DS is obtained from the driver monitoring unit (DMU) installed in vehicles comprising of a vision system to monitor driver activity [44]. It is of note that, although different states of driver are monitored, generally the output of the DMU is binary indicating an active driver or a distracted driver [28].

• *Mapping driver workload to activity*: In the context of driver workload, effective driver performance decreases with an increase in workload levels. Similarly, for low activity (corresponding low workload) level, also the performance of the driver is low, as the driver is not significantly involved in the driving task. Analytically, this relationship is expressed as in Equation (11).

$$\gamma = 1 - e^{(\sigma\_1 T\_{dN})^{\circ 2}DS^{\circ 3}} \tag{11}$$

where *γ* ∈ [0, 1] indicates driver activity, *σ*<sup>1</sup> = 2, *σ*<sup>2</sup> = 3, and *σ*<sup>3</sup> = 3 selected appropriately to consider the degree of influence of the physical and cognitive components on the driver activity. This relationship is presented graphically in form of a U-shaped function in [27].

• *Activity-based level of assistance generation*: The level of assistance (LOA) required to complete a driving task can be determined similarly to [27], by using the inverse-U relationship between driver performance and LOA. Considering the objective of providing high assistance to the driver during under-load and over-load (i.e., low activity) regions, an analytical mapping for driver performance-LOA is defined as in Equation (12).

$$\mu(\gamma) = \frac{1}{1 + |\frac{\gamma - p\_3}{p\_1}|^{2p\_2}} + \mu\_{\min} \tag{12}$$

The time-varying parameter *μ*(*γ*) ∈ [*μmin*, 1] represents a modulation factor that relates the driver workload-based performance with the LOA for task completion. The parameters *p*<sup>1</sup> = 0.355, *p*<sup>2</sup> = −2, *p*<sup>3</sup> = 0.5 are chosen to replicate the U-shaped relationship as discussed in [27] and shown in Figure 2. A minimum assistance level of *μmin* = 0.2 is used to consider the influence of sensor noise, drift, etc.

The computed level of assistance function can be then used to modulate the assistance torque *Ta* and thus adapt the autonomous control action to the driver as in Equation (13).

$$T\_a = \mu(\gamma) T\_{fb} \tag{13}$$

where *Tf b* is a robust feedback control torque to be designed. Employing the modulated assistance torque, the HMI between the driver and the autonomous controller can be effectively managed for completing a specific driving task.

**Figure 2.** The driver workload and corresponding level of assistance required.

#### **3. Robust DiL Lane Keeping Control: A HOSM approach**

The shared control between the human driver and an LKA controller typically focuses on tracking the desired reference while improving the driver comfort [7,20,42].

#### *3.1. Control Oriented DiL Modeling*

For DiL tasks, we incorporate the influence of driver effort by using a two-point visual driver torque model [5] for developing the control specific model as in Equation (14).

$$T\_d = \mathcal{K}\_c \theta\_{near} + \mathcal{K}\_a \theta\_{far} \tag{14}$$

with *θnear*, *θ f ar* representing the near and far visual points of the driver along a road curvature. Based on information of these angles, the driver generates anticipatory action and compensatory action corresponding to the near and far angles respectively. Subsequently, he/she predicts the future road and generates the anticipated steering action before entering the curve based on the far visual angle. The compensatory behavior of the driver is emphasized for lane-keeping aspects. This driver behavior is represented using the anticipatory and compensatory gains *Ka* and *Kc* respectively as shown in (14). For further details, please refer to [5].

Integrating the above dynamics in Equations (2) and (14), a DiL lane-keeping model of the following form can be formulated in Equation (15).

$$\dot{\mathbf{x}}(t) = A(t)\mathbf{x}(t) + B(t)T\_d(t) + E(t)\omega(t) \tag{15}$$

with the states as *x* = *x*<sup>1</sup> *x*<sup>2</sup> *x*<sup>3</sup> *x*<sup>4</sup> *x*<sup>5</sup> *x*<sup>6</sup> *<sup>T</sup>* = *β ψ*˙ *<sup>v</sup> yl* Ψ*<sup>l</sup> δ<sup>d</sup>* ˙*δ<sup>d</sup> T* . The system matrices are given as in matrices (17).

$$A(t) = \begin{bmatrix} a\_{11} & a\_{12} & 0 & 0 & a\_{15} & 0 \\ a\_{21} & a\_{22} & 0 & 0 & a\_{25} & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 \\ \upsilon\_x & l\_s & \upsilon\_x & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 \\ a\_{61} & a\_{62} & a\_{63} & a\_{64} & a\_{65} & a\_{66} \end{bmatrix}, B = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ b\_1 \end{bmatrix} \tag{16}$$

$$E(t) = \begin{bmatrix} c\_1 & c\_2 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & -\upsilon\_x & 0 & 0 \\ c\_1 & c\_2 & 0 & 0 & 0 & c\_3 \\ & & & & \ddots & \ddots & \ddots \end{bmatrix}^T, w(t) = \begin{bmatrix} F\_w \\ \rho\_r \\ \Delta E\_{yf} \\ \Delta E\_{yf} \end{bmatrix} \tag{17}$$

Δ*Fyr*

*e*<sup>4</sup> *e*<sup>5</sup> 0 0 00

with *<sup>a</sup>*<sup>11</sup> <sup>=</sup> <sup>−</sup>2(*Cf* <sup>+</sup> *Cr*)/*Mvx*, *<sup>a</sup>*<sup>12</sup> <sup>=</sup> <sup>2</sup>(*lrCr* <sup>−</sup> *lf Cf*)/*Mv*<sup>2</sup> *<sup>x</sup>*, *a*<sup>15</sup> = 2*Cf* /*MvxRs*, *a*<sup>21</sup> = 2(*lrCr* − *lf Cf*)/*Iz*, *a*<sup>22</sup> = −2(*l* 2 *<sup>r</sup> Cr* + *l* 2 *<sup>f</sup> Cf*)/*Izvx*, *a*<sup>25</sup> = 2*Cf lf* /*IzRs*, *a*<sup>61</sup> = (2*Cf ηt*/(*IsRs*)) + *Kcτ*<sup>2</sup> *<sup>a</sup> a*21, *a*<sup>62</sup> = (2*Cf lf ηt*/(*IsRsvx*)) + *Kc*(*τ<sup>a</sup>* + *τ*<sup>2</sup> *<sup>a</sup> a*22), *a*<sup>63</sup> = *Ka*/*Is*, *a*<sup>64</sup> = *Ka*/(*Isvxτp*), *a*<sup>65</sup> = (−2*Cf <sup>η</sup>t*/(*IsR*<sup>2</sup> *<sup>s</sup>*)) + *Kcτ*<sup>2</sup> *<sup>a</sup> a*25, *a*<sup>66</sup> = −*Bs*/*Is*, *b*<sup>1</sup> = 1/*Is*, *e*<sup>1</sup> = 1/*Mvx*, *e*<sup>2</sup> = *lw*/*Iz*, *e*<sup>3</sup> = *lf* /*Iz*, *e*<sup>4</sup> = −*lr*/*Iz*, and *e*<sup>5</sup> = −*Kpηt*/*IsRs*.

The autonomous assistance torque *Ta* for completing the driving task in the presence of disturbances *ω* and the uncertainties Δ can be now designed. Integrating the assistance modulation factor developed earlier, the DiL model used for controller design can be expressed as in Equation (18).

$$\dot{\mathbf{x}} = A(t)\mathbf{x}(t) + B\_1(t)T\_{fb}(t) + E(t)\omega(t) \tag{18}$$

with *B*<sup>1</sup> = *Bμ*(*γ*) and *Tf b* as the control torque to be designed for stabilizing the DiL system.

#### *3.2. Control Objectives for LKA*

The control objectives for the above DiL lane-keeping task are formulated as:

• Minimization of lane tracking errors: The lane tracking errors as given in Equation (8) comprise the errors lateral deviation and the heading angle. To quantify the lane error at a look-ahead distance, the parameter *el* is defined as in Equation (19).

$$e\_l = y\_l + l\_s \Psi\_l \tag{19}$$

The control objective is then to ensure that the front wheels of the vehicle are simultaneously located in strip (±*d* = 1.5 m) along the lane center line. In other words, the following condition in Equation (20).

$$|e| \le \frac{2d - w\_r}{2} \tag{20}$$

where *wr* denotes the width of the vehicle.


$$
\dot{\mathfrak{x}}\_{cf} = T\_d^s - \lambda\_c T\_a \tag{21}
$$

where *λ<sup>c</sup>* is any positive parameter reflecting the level of sharing, and *T<sup>s</sup> <sup>d</sup>* represents the driver torque measured at the steering wheel. In case of conflict, the value of *<sup>x</sup>*˙*c f* −→ 0. In such a case, it can be deduced that *<sup>λ</sup>cTa* <sup>≈</sup> *<sup>T</sup><sup>s</sup> <sup>d</sup>*. Hence, by the appropriate design of the parameter *λc*, the influence of the assistance torque can be reduced.

For the above control objectives, we now propose a robust HOSM controller for the DiL dynamics in Equation (18) to design the torque *Tf b* and *Ta* subsequently.

#### *3.3. Robust HOSM Controller*

Integrating the above control objectives, a linear error surface to be regulated can be defined as in Equation (22).

$$
\sigma\_{\mathcal{E}} = k\_1 \varepsilon\_l + k\_2 \dot{\varepsilon}\_l + k\_3 \dot{\delta}\_d + k\_4 \varkappa\_{\mathcal{E}f} \tag{22}
$$

for the gains *ki* > 0, *i* = 1 ... 4 designed to ensure convergence of the error surface. To stabilize the DiL system and ensure that the tracking error *σ<sup>c</sup>* converges to a stable equilibrium, the following finite time controller is proposed.

**Theorem 1.** *For the DiL system in Equation* (18)*, the feedback control Tf b which ensures that the tracking error σ<sup>c</sup> in Equation* (22) *converges to a practically stable equilibrium can be designed as in Equation* (23)*.*

$$T\_{fb} = \frac{1}{\Omega\_\text{u}\mu(\gamma)} \left(-\Omega\_\text{c} + \nu(\sigma\_\text{c})\right) \tag{23}$$

*where* Ω*c*, Ω*<sup>u</sup> are defined later and a novel robust HOSM control ν*(*σc*) *to reject the effect of disturbances is defined as in Equation* (24)*.*

$$\nu(\sigma\_c) = -\mathfrak{a}\_1 \nu\_1(\sigma\_c) - \mathfrak{a}\_2 \int\_0^t \nu\_2(\sigma\_c) \, dt \tag{24}$$

*with, ν*1(*σc*) = |*σc*| *<sup>η</sup>*<sup>1</sup> *sign*(*σc*) *and <sup>ν</sup>*2(*σc*) = <sup>|</sup>*σc*<sup>|</sup> *<sup>η</sup>*<sup>2</sup> *sign*(*σc*)*,* <sup>1</sup> <sup>−</sup> <sup>2</sup>*η*<sup>1</sup> <sup>+</sup> *<sup>η</sup>*<sup>2</sup> <sup>=</sup> <sup>0</sup>*,* <sup>1</sup> <sup>&</sup>gt; *<sup>η</sup>*<sup>1</sup> <sup>≥</sup> 0.5*, and α*1, *α*2, *α*<sup>3</sup> > 0 *are positive constants.*

**Proof.** The dynamics of the tracking error *σ<sup>c</sup>* can be expressed as in Equation (25).

$$\begin{split} \psi\_{\varepsilon} &= k\_1 \dot{\varepsilon}\_l + k\_2 \ddot{\varepsilon}\_l + k\_3 \ddot{\delta}\_d + k\_4 \lambda\_c \dot{\chi}\_{cf} \\ &= \beta f\_1 + \dot{\psi}\_{\varepsilon} f\_2 + \Psi\_l f\_3 + \delta\_d f\_4 + f\_5 + \Delta\_l + (k\_3 b\_1 - k\_4 \lambda\_{\bar{c}}) \mu(\gamma) T\_{fb} \\ &= \Omega\_{\varepsilon} + \Omega\_{\mathfrak{u}} \mu(\gamma) T\_{fb} + \Delta\_t \end{split} \tag{25}$$

where <sup>Ω</sup>*<sup>c</sup>* = *<sup>β</sup> <sup>f</sup>*<sup>1</sup> + *<sup>ψ</sup>*˙ *<sup>v</sup> <sup>f</sup>*<sup>2</sup> + <sup>Ψ</sup>*<sup>l</sup> <sup>f</sup>*<sup>3</sup> + *<sup>δ</sup><sup>d</sup> <sup>f</sup>*<sup>4</sup> + *<sup>f</sup>*5, <sup>Ω</sup>*<sup>u</sup>* = *<sup>k</sup>*3*b*<sup>1</sup> − *<sup>k</sup>*4*λc*, *<sup>f</sup>*<sup>1</sup> = *<sup>k</sup>*1*vx* + *<sup>k</sup>*2*vxa*<sup>11</sup> + 2*k*2*lsa*<sup>21</sup> + *k*3*a*61, *f*<sup>2</sup> = 2*k*1*ls* + *k*2*vxa*<sup>12</sup> + 2*k*2*lsa*<sup>22</sup> + *k*2*vx* + *k*3*a*62, *f*<sup>3</sup> = *k*1*vx* + *k*3*a*64, *f*<sup>4</sup> = *k*2*vxa*<sup>15</sup> + 2*k*2*lsa*<sup>25</sup> + *k*3*a*65, *f*<sup>5</sup> = *k*3*a*63*yl* + *k*3*a*<sup>66</sup> ˙ *δ<sup>d</sup>* + *k*4*T<sup>s</sup> <sup>d</sup>*, Δ*<sup>t</sup>* = *fdt* + *fdt*<sup>1</sup> − *k*1*lsvxρ<sup>r</sup>* + *e*3Δ*Fy f* , *fdt* = *k*2[*β <sup>∂</sup>vx <sup>∂</sup>dt* + Ψ*<sup>l</sup> ∂vx <sup>∂</sup>dt* − *ls ∂vx <sup>∂</sup>dt ρ<sup>r</sup>* − *lsvx ∂ρr <sup>∂</sup>dt* ], and *fdt*<sup>1</sup> = *k*2*vx*(*e*1*Fw* + *e*1Δ*Fy f* + *e*4Δ*Fyr*) + <sup>2</sup>*k*2*ls*(*e*2*Fw* <sup>+</sup> *<sup>e</sup>*2Δ*Fy f* <sup>+</sup> *<sup>e</sup>*5Δ*Fyr*) <sup>−</sup> *<sup>k</sup>*2*ρrv*<sup>2</sup> *x*.

Substituting for the feedback control designed in Equation (23), the error dynamics can be now expressed as in Equation (26).

$$
\sigma\_{\mathfrak{c}} = \nu(\sigma\_{\mathfrak{c}}) + \Delta \tag{26}
$$

The lumped disturbance Δ consists of the effects of road curvature, crosswinds, and uncertain tire friction forces. For all practical operating conditions, these disturbances and their time derivatives can be assumed to be bounded. It can be further shown that the lumped disturbance can be divided as Δ = Δ1(*σc*) + Δ<sup>2</sup> with simplifications of the expression in Equation (25). The disturbance terms can be shown to be bounded as in Equation (28).

$$\|\Delta\_1\| \le \chi\_1 \|\sigma\_\varepsilon\|\tag{27}$$

$$|\dot{\Delta}\_2| \le \chi\_2 \tag{28}$$

where *χ*1, *χ*<sup>2</sup> > 0 are any positive parameters.

Now consider the following Lyapunov function in Equation (29).

$$V\_{\mathfrak{C}} = \Sigma^{T} Q\_{\mathfrak{C}} \Sigma \tag{29}$$

with Σ = " *ν*<sup>1</sup> *σ<sup>c</sup>* , *t* <sup>0</sup> *<sup>ν</sup>*2(*σc*) *dt*%*<sup>T</sup>* and the matrix *Qc* = *Q<sup>T</sup> <sup>c</sup>* > 0 denoting a positive definite matrix defined, as [35], in Equation (30).

$$Q\_{\mathcal{E}} = \frac{1}{2} \begin{bmatrix} (4\alpha\_2 + \alpha\_1^2) & \alpha\_1 \alpha\_3 & -\alpha\_1 \\ \alpha\_1 \alpha\_3 & (1 + \alpha\_3^2) & -\alpha\_3 \\ -\alpha\_1 & -\alpha\_3 & 2 \end{bmatrix} \tag{30}$$

The above Lyapunov function satisfies the condition in Equation (31).

$$
\lambda\_{\min} ||\Sigma||^2 \le V\_c \le \lambda\_{\max} ||\Sigma||^2 \tag{31}
$$

with *λmin*, *λmax* representing the minimum singular value and the maximum eigenvalue respectively. The rate of evolution of this Lyapunov function can be computed, similarly to [35], as in Equation (32).

$$\dot{V}\_{\rm c} = -\frac{1}{\|\sigma\_{\rm c}\|^{(1-n\_1)}} \boldsymbol{\Sigma}^T \boldsymbol{Q}\_{\rm c1} \boldsymbol{\Sigma} - \boldsymbol{\Sigma}^T \boldsymbol{Q}\_{\rm c2} \boldsymbol{\Sigma} \tag{32}$$

where *Qc*<sup>1</sup> and *Qc*<sup>2</sup> are two positive definite matrices. By the choice of the gains as *<sup>α</sup>*<sup>1</sup> <sup>&</sup>gt; [(2*α*2*χ*<sup>1</sup> <sup>+</sup> *<sup>α</sup>*3*χ*<sup>2</sup> <sup>−</sup> *<sup>α</sup>*2*α*3)/(2*α*<sup>3</sup> <sup>−</sup> 0.5*χ*1)]0.5, *<sup>α</sup>*<sup>2</sup> <sup>&</sup>gt; (2*χ*<sup>2</sup> <sup>−</sup> *<sup>α</sup>*<sup>2</sup> <sup>1</sup>)/2, and *<sup>α</sup>*<sup>3</sup> > *<sup>χ</sup>*1(*α*<sup>2</sup> <sup>1</sup>/2 + 2*α*2)/(*α*<sup>2</sup> + 2*α*<sup>2</sup> <sup>1</sup> − *χ*2), it can be shown, similar to [35], that Equation (33) is valid.

$$\dot{V}\_{\rm c} = -\frac{1}{\|\sigma\_{\rm c}\|^{(1-n\_1)}} \lambda\_{\rm min} Q\_{\rm c1} \|\boldsymbol{\Sigma}\|^2 - \lambda\_{\rm min} Q\_{\rm c2} \|\boldsymbol{\Sigma}\|^2 \tag{33}$$

Thus, with the proper selection of the gains *α<sup>i</sup>* > 0, the Lyapunov function *V*˙ *<sup>c</sup>* is negative definite and the sliding surface converges to attain practical bounded stability.

In the designed closed loop shared control in Theorem 1, the sharing parameter *μ*(*γ*) is directly accounted for in the design of the feedback input *Tf b* as shown in Equation (23). Thus, the stability of the DiL closed-loop system in Equation (18) in the presence of road disturbances and tire-friction uncertainties for any authority transfer or shared driving between the driver and the automation system can be ensured.

**Remark 1.** *In the designed feedback control Tf b, singularity condition can arise when* Ω*uμ*(*γ*) → 0*, i.e., if* (*k*3*b*<sup>1</sup> − *k*4*λc*) → 0 *or if μ*(*γ*) → 0*. However, the modulation factor is a positive bounded entity i.e., μ*(*γ*) ∈ [*μmin*, 1] *as presented earlier, and will not result in a singularity condition for the controller. Further, by the selection of the gains κ, λ<sup>c</sup> such that k*3*b*<sup>1</sup> = *k*4*λc, the design of the control input would always be feasible.*

A flowchart for the methodology of implementation of the proposed control scheme is presented in Figure 3.

**Figure 3.** Flow chart of the methodology.

#### **4. Validation and Results**

The proposed driver activity adapted cooperative LKA controller was validated on a MATLAB-SIMULINK platform and the SHERPA vehicle simulator for real-time testing.

#### *4.1. Simulation Studies*

The performance of the proposed approach was evaluated to satisfy the control objectives under the following constraints for safe vehicle operation in Equation (34).

$$|\psi| \le \dot{\psi}\_{\max} |\Psi\_l| \le \Psi\_{l\_{\max}} |y\_l| \le y\_{l\_{\max}} |\delta\_f| \le \delta\_{f\_{\max}} |\dot{\delta}\_f| \le \dot{\delta}\_{f\_{\max}} |T\_4| \le T\_{4\_{\max}} \tag{34}$$

where *<sup>ψ</sup>*˙*max* = 0.55 rad/s, <sup>Ψ</sup>*lmax* = 0.1 rad, *ylmax* = 1.5 m, *<sup>δ</sup>fmax* = 0.2 rad, ˙ *δfmax* = 0.15 rad/s and *Tamax* = 20 Nm.

For performance evaluations the following controllers were compared:


The sliding surface gains, defined in Equation (22), without the sharing parameter term were obtained using particle swarm optimization (PSO) [46] for optimal results. Accordingly, each particle was defined as *X* = *k*<sup>1</sup> *k*<sup>2</sup> *k*<sup>3</sup> . Consequently, the particles were able to obtain the optimal solutions for the gains based on an objective function which was formulated to minimize the lane tracking errors and satisfy the system constraints in Equation (34) discussed earlier. We considered particle size as 20 and a total of 100 iterations for the PSO algorithm. Using the PSO approach, the sliding surface gains were computed as *k*<sup>1</sup> = 3.6085, *k*<sup>2</sup> = 10.5804, and *k*<sup>3</sup> = 0.9706. Subsequently, the gains of the novel STA controller were selected as *α*<sup>1</sup> = 33.9379, *α*<sup>2</sup> = 150, *α*<sup>3</sup> = 11.2697 and *β* = 0.6383 for normal road conditions with unity road friction. The conflict parameter gains were chosen as *k*<sup>4</sup> = 0.001, *λ<sup>c</sup>* = 1.5, respectively.

To replicate the human driver torque for the simulation study, a dynamic model based on neuromuscular attributes, time-lags, etc., as discussed in [9,43] was employed. Employing this driver model with varying parameters, the virtual driver torque for simulations was replicated. For all validation purposes, the driver gains were considered as *Kc* = 8.57 and *Ka* = 15.75 respectively. Accounting for the mental workload, two driver states i.e., watchful and distracted to compute the driver state variable *DS* were considered. During the distracted mode, the external driver torque input was scaled by a factor of 0.2 to represent a distracted driver.

The simulations were performed on the Satory test track [5] as shown in Figure 4a under variable longitudinal velocity conditions i.e., *vx* ∈ [5, 25] m/s shown in Figure 4b. The lateral acceleration of the vehicle is limited to <sup>|</sup>*ay*|*max* <sup>≤</sup> <sup>2</sup> m/s2, indicating normal driving conditions as shown in Figure 4c. To evaluate the shared control performance, we considered the human driver to be distracted between *t* ∈ [40, 80] s while during the rest of the driving cycle, the driver was watchful. Accordingly, the input driver torque reflecting such conditions is shown in Figure 4d.

**Figure 4.** The road and driver input conditions: (**a**) Curvature; (**b**) Longitudinal Velocity; (**c**) Lateral Acceleration; (**d**) Driver input torque

The performance of the Auto-HOSM and CLKA-HOSM controllers are presented in Figure 5a–d along with that of the Auto-HOSM. Both controllers ensured that the lane tracking errors and the steering rate were within the prescribed limits discussed earlier. As the shared controller incorporates human action in the control process, the above performance indicators of the shared controller have a higher magnitude that their autonomous counterpart. The root mean square (rms) and maximum values of the above indicators for the Auto-HOSM controller were computed as *ylrms* = 0.57, Ψ*lrms* = 0.0131, ˙ *<sup>δ</sup>drms* <sup>=</sup> 2.5917 and <sup>|</sup>*yl*|*max* <sup>=</sup> 1.19, <sup>|</sup>Ψ*l*|*max* <sup>=</sup> 0.0469, <sup>|</sup> ˙ *δd*|*max* = 7.6202, respectively. Similarly, the performance metrics of the CLKA-HOSM controller were *ylrms* = 0.5267, Ψ*lrms* = 0.0162, ˙ *<sup>δ</sup>drms* <sup>=</sup> 2.0609, and <sup>|</sup>*yl*|*max* <sup>=</sup> 1.2750, <sup>|</sup>Ψ*l*|*max* <sup>=</sup> 0.0446, <sup>|</sup> ˙ *δd*|*max* = 5.9638. Such performance metrics indicate good lane-keeping performance for both controllers. Further, the steering rate performance shows improvement under the proposed CLKA-HOSM controller.

Along with such lane-keeping performance, the conflict between the human driver and the autonomous controller for the CLKA-HOSM controller is also presented in Figure 5d. Using the proposed controller, the conflict is kept within limits such that, *TdTa* <sup>&</sup>gt; <sup>−</sup><sup>5</sup> *<sup>N</sup>*2.

**Figure 5.** The lane tracking and driver comfort performance for the proposed controller (CLKA-HOSM) and the autonomous controller (Autonomous-HOSM). (**a**) Lateral deviation error; (**b**) Heading error; (**c**) steering rate; (**d**) Conflict product of driver and automation torques.

For further illustration of the shared control performance, the torques generated by the human driver and autonomous agent along with the driver activity–performance indicators are presented in Figure 6. Based on the driver's activity, the level of assistance torque generated varies for completing the driving task.

**Figure 6.** The HMI under the CLKA-HOSM controller. (**a**) Driver and Assistance Torque; (**b**) Driver activity and the level of assistance provided.

To assess the performance of shared control activity, the following metrics [42] were also a time interval *η*:

$$\text{AFac} = \frac{T\_{d\_{pow}}}{T\_{a\_{pow}}}, \text{ SW} = \frac{1}{\eta} \int\_0^{\eta} T\_a(t) \, T\_d(t) \delta\_d(t) dt,\tag{35}$$

• AFac: Denotes the ratio between efforts generated by the automation and human driver for completing the driving task i.e., in Equation (36).

$$\mathbf{AFac} = \frac{T\_{d\_{pow}}}{T\_{a\_{pow}}} \tag{36}$$

If the values of AFac> 1, the assistance provided by the automation is less than that of the driver, and inversely for AFac< 1.

• SW: This indicates the steering workload and is representative of the effort generated by both agents simultaneously for completing the driving task i.e., in Equation (37).

$$\text{SNV} = \frac{1}{\eta} \int\_0^\eta T\_a(t) T\_d(t) \dot{\delta}\_d(t) dt \tag{37}$$

A larger magnitude of negative steering workload indicates that the assistance provided by the automation to the human driver is not good for shared control.

For efficient shared control, the AFac should be less than 1 and the negative steering workload should be low. Using the proposed CLKA-HOSM controller, these metrics are computed as AFac = 0.8192 and Negative SW = −206.6476, indicating a good quality of shared control. To assess the shared control performance further, performance analysis was performed for a shared controller based on the proposed HOSM control law, but with no conflict parameter i.e., *k*<sup>4</sup> = 0 and *λ<sup>c</sup>* = 0 (**SC-NoK4**) and is presented in Table 1. Please note that the values of Neg SW (i.e., negative steer workload) and *TdTamin* (i.e., maximum value of conflict) is less than zero.


**Table 1.** Influence of *k*<sup>4</sup> and *λ<sup>c</sup>* on HMI.

With the increase in the magnitude of *k*4, the negative steer workload and the maximum values of conflict increase showing deteriorating shared control performance. Further, the lateral error also increases, from a minimum of 1.219 m to a maximum of 1.889 m, as more control is passed on to the human driver, from *K*<sup>4</sup> = 0 to *K*<sup>4</sup> = 0.01. Similar performance is seen with the increase in values of *λ<sup>c</sup>* as well, from *λ<sup>c</sup>* = 0 to *λ<sup>c</sup>* = 2. From the presented results, the best performance in terms of lane errors, |*yl*|*max* = 1.624, and conflict reduction, *TdTa* = 5.242, is obtained for *k*<sup>4</sup> = 0.001 and *λ<sup>c</sup>* = 0.5. Further, the presence of the gains *k*<sup>4</sup> and *λ<sup>c</sup>* improves the performance of the controller, in terms of conflict minimization and negative SW, in comparison to the case when *k*<sup>4</sup> = 0 and *λ<sup>c</sup>* = 0 across all aspects.

To ascertain the robustness of the proposed CLKA-HOSM controller, random parametric uncertainties in the vehicle and driver parameters were considered. Specifically, uncertain values of *M*, *Iz*, and *Is* which are susceptible to the payload, wear, tear, etc. are employed. Similarly, the uncertainty in driver model parameters *Ka* and *Kc* to account for various driver behaviors are also considered. The lane-keeping and conflict reduction performance of the CLKA-HOSM (i.e., C1) and SC-NoK4 (i.e., C2) controllers under influence of such uncertainties are presented in Table 2.


**Table 2.** Influence of uncertainties on controller performance.

Under the influence of vehicle and driver uncertainties up to 20%, the proposed CLKA-HOSM controller performs well in ensuring lane keeping (|*yl*|*rms* = 0.52, |Ψ*l*|*rms* = 0.017 for CLKA-HOSM, against |*yl*|*rms* = 0.47, |Ψ*l*|*rms* = 0.018 for SC-NoK4) and also minimizing the conflict between driver and autonomous system (*TdTamin* = 6.705 for CLKA-HOSM, against *TdTamin* = 7.169 for SC-NoK4). The CLKA-HOSM controller outperforms the SC-NoK4 controller in handling uncertainties, and thus establishes the significance of the gains *k*<sup>4</sup> and *λ<sup>c</sup>* in performance enhancement.

#### *4.2. Experimental Results: SHERPA Vehicle Simulator*

The shared DiL-LKA approach was validated in real-time on the SHERPA vehicle simulator shown in Figure 7.

**Figure 7.** Experimental setup for the SHEPRA vehicle simulator.

The SHERPA simulator is built using a modified Peugot 206 vehicle on a Stewart platform and is composed of multiple modules for handling driving-related tasks such as perception, path planning, driver monitoring, and human–machine interface management. For more details on the SHERPA simulator, refer to [5]. Using the driving monitoring unit, the driver state is directly available as a binary input while the torque is measured via a sensor on the steering wheel. With haptic feedback via the steering wheel provided, this simulator setup has been used for validation of direct shared control works [5,43] similar to that proposed in this work.

Using the SHERPA setup (with a discretization time of 0.01 s), we now present illustrative results to highlight the lane-keeping and conflict-reduction performance of the proposed shared DiL controller in this work to further support our earlier presented simulation-based analysis. All performance evaluations on the SHERPA simulator are made on a test track represented in Figure 8 that comes from the CoCoVeA project (Cooperation Conductor-Véhicule Automatisé).

**Figure 8.** CoCoVeA track and lanes directions along with road curvature of the sections.

The results for the Auto-HOSM controller robustness against longitudinal speed and the friction variations are first presented to highlight the robustness of the proposed novel control law. For multiple driving tests performed, the aggregated results are presented in Table 3. It can be seen that the variations in the longitudinal speed of the vehicle and the road friction do not affect the performance of the proposed controller. The controller ensures good trajectory tracking by maintaining the lateral deviation below |*yl*|*max* = 0.5824 < 1.5 m, the maximum heading error below |Ψ*l*|*max* = 0.0074 < 0.1 rad, without saturating the motor control of the steering system |*Ta*|*max* = 1.2104 < 20 N·m.


**Table 3.** Influence of speed variation and friction on the performance of the lane-keeping controller.

In the second case, the proposed CLKA-HOSM controller for an obstacle avoidance scenario is tested with sharing parameter values as *k*<sup>4</sup> = 15 and *λ<sup>c</sup>* = 0.5. Accordingly, as shown in Figure 9, three obstacles were placed on the road, and the driver was asked to avoid them by changing the lane. For comparisons, the same test was also repeated with the Auto-HOSM controller weighted by the LOA function presented in Equation (12). The performance results for both controllers are presented in Figure 9.

**Figure 9.** (**a**) The obstacle avoidance scenario; (**b**) comparison between the LKA controller weighted with the LOA function and the CLKA-HOSM controller for the minimization of the conflict for an obstacle avoidance scenario using a metric Integral of Conflict.

In Figure 9b, the metric *Integral of Conflict* is defined as *IOC* <sup>=</sup> <sup>−</sup><sup>1</sup> *τ* ,*T* 0 *Ta*(*t*)*Td*(*t*)*dt* for a time period *τ*. It was observed that the CLKA-HOSM controller is more efficient in terms of conflict minimization i.e., the maximum value of the integral of the conflict is 2.7. On the other hand, the maximum value of the integral of the conflict for the Auto-HOSM controller weighted with the LOA function was 7.8. In the case of the proposed CLKA-HOSM controller, AFac = 0.8818, Negative SW = 14.9646, and (*TdTa*)*min* = −4.8727 was obtained. In contrast, for the Auto-HOSM controller weighted with the LOA function, AFac = 0.974, Negative SW = 108.218, and (*TdTa*)*min* = −24.539 were obtained. Such results show that the proposed CLKA-HOSM outperforms the other design in terms of shared control performance.

For further analysis of the shared control performance, the parameters *k*<sup>4</sup> and *λ<sup>c</sup>* were varied and tests were performed. Performance results for the CLKA-HOSM controller under such variations are shown in Table 4.


**Table 4.** Influence of *k*<sup>4</sup> and *λ<sup>c</sup>* on HMI.

It can be seen in Table 4 that the shared parameters have a significant impact on the AFac metric, from 0.4794 to 1.0604, and SW metric, from −14.9646 to −348.7971. The chosen best combination values of these metrics using the proposed CLKA-HOSM controller are AFac = 0.8818 and SW = −14.9646, indicating a good quality of shared control. From the presented results, the best performance in terms of conflict reduction is obtained for the combination *k*<sup>4</sup> = 15 and *λ<sup>c</sup>* = 0.5.

#### **5. Conclusions**

In this work, a novel robust shared controller for a DiL-lane-keeping assistance system was proposed and evaluated. The HMI was managed via an adaptive mapping which reflected driver performance corresponding to the identified physical and mental workload of the driver. Along with lane tracking errors and driver comfort enhancement, the issue of conflict between the driver and autonomous controller was also addressed by the introduction of a novel sharing parameter. Addressing such objectives, a novel higherorder sliding mode control algorithm was proposed and its stability for the closed-loop DiL system affected by disturbances was established.

The performance of the proposed controller was evaluated via simulations and experiments on the SHERPA vehicle simulator for different longitudinal velocity, different road friction conditions, time-varying road curvatures of the Satory test track, parametric uncertainties, and for obstacle avoidance scenarios. Comparison between the fully autonomous controller, the proposed sharing control law without the introduction of the novel parameter for conflict reduction, and the proposed sharing control law with the introduction of this minimization parameter was extensively discussed. From the experimental results, it can be seen that the fully autonomous controller achieved the best lane tracking and heading error performances (30% better than the sharing control law), but the sharing control law achieved the best conflict minimization (65.38% better than the sharing control law without the introduction of this novel term). Further, the cooperative driving quality improved by 9.4%, and the negative steering workload was reduced by 86.13% in comparison to the Auto-HOSM controller showing the efficiency of the proposed controller.

The proposed controller was constructed in order to deal with the goals of lane maintenance, driver comfort improvement, and conflict reduction, which fill a particular need in improving the driving experience for road vehicle transportation. In the future, the driver activity function will be enhanced by including the driving style, skill, and other attributes reflecting a wider variety of driver behaviors. An expansion of the proposed cooperative architecture to the cruise and integrated longitudinal–lateral control will be carried out.

**Author Contributions:** Conceptualization: J.J.R.; Methodology: J.J.R.; Software production: G.P. and M.R.O.; Experiments: G.P., J.J.R. and M.R.O.; Writing—original draft preparation: J.J.R., G.P. and C.S.; Writing—review and editing: G.P., J.J.R., C.S. and J.-C.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work has been done in the framework of the CoCoVeIA research program (ANR-19- CE22-0009-01), funded by the French National Research Agency. This work was also sponsored by the French Regional Delegation for Research and Technology, the French Ministry of Higher Education and Research, and the French National Center for Scientific Research.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Nomenclature**



#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Sensors* Editorial Office E-mail: sensors@mdpi.com www.mdpi.com/journal/sensors

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel: +41 61 683 77 34

www.mdpi.com ISBN 978-3-0365-7331-1