The Effects of Speed and Delays on Test-Time Performance of End-to-End Self-Driving
Abstract
:1. Introduction
- The type of mistakes that matter differs. Driving is a sequential decision-making process. Small temporally correlated errors can make the vehicle drift off the safe trajectory little by little [25]. Behavioral cloning with usual loss functions and random minibatch sampling penalizes such consistent but weak biases minimally. Such errors look harmless in off-policy testing. Furthermore, the long-term negative effects of actions are not evaluated in off-policy testing [24].
- Incoming data differ. During deployment, the learned policy defines the distribution of situations the vehicle ends up in. This distribution may significantly differ from the training and test set situations resulting from expert driving. Behavioral cloning models trained on human driving may fail to generalize to the distribution caused by self-driving. This well-known effect is often called the distribution shift [1,24,26].
- Delays differ. Delays play no role in the predictive behavioral cloning task optimized when creating a model. The loss does not increase if the computation is slow. When deployed in the real world, delays exist between the moment an observation is captured by the camera and the moment the actuators receive the computed response to this observation. This has been discussed for model predictive control [27,28] but not for end-to-end approaches. Due to delays, optimal decisions may become outdated by the time they are applied.
- Decision frequency differs. Prediction frequency does not influence the loss in the behavioral cloning task the models are optimized to perform. The loss value does not increase if the computation is slow. However, during driving, infrequent decisions may overshoot their intended effect, resulting in an oscillation around the actual desired path. Furthermore, at low frequencies, the model can be simply too late to react to situations arising between decision ticks.
- 1.
- We demonstrate that the performance of good driving pipelines may fall apart if deployed at a speed the system was not exposed to during training. The underlying reasons and the implications for on-policy testing are explained. To our knowledge, the effect of deployment speed has previously not been discussed in the end-to-end literature.
- 2.
- We illustrate, via real-world testing, how the performance of good driving models suffers due to computational delays. The presented results demonstrate that label-shifting the training data allows to easily alleviate the problem, reducing the effect of delays. Incorporating delay anticipation into end-to-end models has not been attempted before.
2. Materials and Methods
2.1. Experimental Design
2.1.1. Study on the Effect of Speed
2.1.2. Study on Counteracting the Effect of Delays via Label-Shifting
2.2. Hardware Setup and Data Collection Procedure
Quality of Driving Data
2.3. Data Preparation
- Study of speed. In the slow speed dataset (19,250 frames), the 17 m lap was, on average, completed in 24.25 ± 1.9 s, i.e., 0.7 m/s speed. The fast dataset (20,304 frames) corresponded to an average 14.85 ± 0.8 s lap time, i.e., 1.1 m/s speed. Both these sets were collected by the teacher-agent driving in the evening time with artificial light. From these two sets of recordings, single and multi-frame datasets were created. In the latter, each data point consisted of three frames matched with the steering command recorded simultaneously with the third frame.Five-fold cross-validation was performed by dividing the data into 5 blocks along the time dimension. In off-policy evaluations, the average validation results across the five folds are reported. For multi-frame models, the data were split into several periods along the time axis, and a continuous 1/5 of each period was assigned to each of the 5 folds. For both model types, new models were trained on the entirety of the given-speed dataset for on-policy evaluations to make maximal use of the data and achieve the best possible performance.
- Study of counteracting the effect of delays by label-shifting. All data in this study were recorded by a very proficient human driver at an average speed of 8.32 ± 0.41 s per lap. Data were collected in the afternoon with no direct sunlight or shadows on the track. Datasets matching camera frames with commands recorded up to 100 ms before and up to 200 ms after the frame capture were created. In total, there are seven datasets with the labels shifted by −100 ms, −50 ms, 0 ms, 50 ms, 100 ms, 150 ms, and 200 ms (due to recording at 20 Hz, shifting by a position corresponds to 50 ms). Each dataset was divided into training and validation sets with a random 80/20 percent split (46,620 and 11,655 frames, respectively). The validation set was only used for early stopping.
2.4. Architectures and Training
- 1.
- Single-frame CNN architecture, trained on fast data.
- 2.
- Single-frame CNN architecture, trained on slow data.
- 3.
- Multi-frame CNN architecture, trained on fast data.
- 4.
- Multi-frame CNN architecture, trained on slow data.
2.5. Evaluation Metrics in the Study of Speed
Measuring the Out-of-Distribution Effect
- 1.
- Using training data, the final embedding layer neuron activations in three possible locations on the computational graph are computed: (a) after the matrix multiplication, (b) after batch normalization (BN), and (c) after both BN and ReLU activation. For each possible extraction location, the analysis is run separately. These activation vectors are called the reference activations.
- 2.
- Similarly, neural activations on the validation set data points of the same speed dataset are computed. The resulting activations are referred to as same-speed activations.
- 3.
- Every validation sample is described by a measure of distance to the reference set, defined as the average distance to the 5 nearest reference set activation vectors. Euclidean and cosine distances are employed as the proximity measures, and a separate analysis is performed for each (Ref. [33] proposed to use Mahalanobis distance, but our experience shows a competitive performance across different datasets with these simpler metrics).
- 4.
- The activation patterns for the entirety of the other-speed dataset are computed. These activation vectors are called the novel speed activations. The distances of these activation patterns to the reference set according to the same metric are computed.
- 5.
- Approximately, the further the activation patterns are from the training patterns, the further out-of-distribution the data point is judged to be for the given model [33]. By setting a threshold on this distance, one can attempt to separate the same speed and novel speed activations. The assumption was that novel speed activations are more different and mostly fall above the set distance threshold. The AUROC of such a classifier is computed and presented as the main separability metric.
2.6. Evaluation Metrics in the Study of Delays
2.7. Code and Data Availability
3. Results
3.1. Changing Speed Causes a Shift in The Task
Additional Cause: Multi-Frame Inputs Become Out-of-Distribution
3.2. The Effect of Delays Can Be Counteracted by Label-Shifting
4. Discussion
4.1. Discussion of Results
4.2. Limitations and Future Work
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Tampuu, A.; Matiisen, T.; Semikin, M.; Fishman, D.; Muhammad, N. A survey of end-to-end driving: Architectures and training methods. IEEE Trans. Neural Netw. Learn. Syst. 2020, 33, 1364–1384. [Google Scholar] [CrossRef]
- Ly, A.O.; Akhloufi, M. Learning to drive by imitation: An overview of deep behavior cloning methods. IEEE Trans. Intell. Veh. 2020, 6, 195–209. [Google Scholar] [CrossRef]
- Huang, Y.; Chen, Y. Autonomous driving with deep learning: A survey of state-of-art technologies. arXiv 2020, arXiv:2006.06091. [Google Scholar]
- Chen, L.; Wu, P.; Chitta, K.; Jaeger, B.; Geiger, A.; Li, H. End-to-end autonomous driving: Challenges and frontiers. arXiv 2023, arXiv:2306.16927. [Google Scholar]
- Pomerleau, D.A. Alvinn: An autonomous land vehicle in a neural network. In Proceedings of the Advances in Neural Information Processing Systems, Denver, CO, USA, 27–30 November 1989; pp. 305–313. [Google Scholar]
- Codevilla, F.; Santana, E.; López, A.M.; Gaidon, A. Exploring the limitations of behavior cloning for autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9329–9338. [Google Scholar]
- Chitta, K.; Prakash, A.; Jaeger, B.; Yu, Z.; Renz, K.; Geiger, A. Transfuser: Imitation with transformer-based sensor fusion for autonomous driving. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 12878–12895. [Google Scholar] [CrossRef] [PubMed]
- Bojarski, M.; Del Testa, D.; Dworakowski, D.; Firner, B.; Flepp, B.; Goyal, P.; Jackel, L.D.; Monfort, M.; Muller, U.; Zhang, J.; et al. End to end learning for self-driving cars. arXiv 2016, arXiv:1604.07316. [Google Scholar]
- Osiński, B.; Jakubowski, A.; Zięcina, P.; Miłoś, P.; Galias, C.; Homoceanu, S.; Michalewski, H. Simulation-based reinforcement learning for real-world autonomous driving. In Proceedings of the 2020 IEEE international conference on robotics and automation (ICRA), Paris, France, 31 May–31 August 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 6411–6418. [Google Scholar]
- Bansal, M.; Krizhevsky, A.; Ogale, A. Chauffeurnet: Learning to drive by imitating the best and synthesizing the worst. arXiv 2018, arXiv:1812.03079. [Google Scholar]
- Kendall, A.; Hawke, J.; Janz, D.; Mazur, P.; Reda, D.; Allen, J.M.; Lam, V.D.; Bewley, A.; Shah, A. Learning to drive in a day. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 8248–8254. [Google Scholar]
- Chekroun, R.; Toromanoff, M.; Hornauer, S.; Moutarde, F. Gri: General reinforced imitation and its application to vision-based autonomous driving. Robotics 2023, 12, 127. [Google Scholar] [CrossRef]
- Bewley, A.; Rigley, J.; Liu, Y.; Hawke, J.; Shen, R.; Lam, V.D.; Kendall, A. Learning to drive from simulation without real world labels. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 4818–4824. [Google Scholar]
- Zeng, W.; Luo, W.; Suo, S.; Sadat, A.; Yang, B.; Casas, S.; Urtasun, R. End-to-end Interpretable Neural Motion Planner. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 8660–8669. [Google Scholar]
- Sadat, A.; Casas, S.; Ren, M.; Wu, X.; Dhawan, P.; Urtasun, R. Perceive, predict, and plan: Safe motion planning through interpretable semantic representations. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XXIII 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 414–430. [Google Scholar]
- Hu, P.; Huang, A.; Dolan, J.; Held, D.; Ramanan, D. Safe local motion planning with self-supervised freespace forecasting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 12732–12741. [Google Scholar]
- Stocco, A.; Pulfer, B.; Tonella, P. Mind the gap! a study on the transferability of virtual vs. physical-world testing of autonomous driving systems. IEEE Trans. Softw. Eng. 2022, 49, 1928–1940. [Google Scholar] [CrossRef]
- Sauer, A.; Savinov, N.; Geiger, A. Conditional affordance learning for driving in urban environments. In Proceedings of the Conference on Robot Learning, Zurich, Switzerland, 29–31 October 2018; pp. 237–252. [Google Scholar]
- Hecker, S.; Dai, D.; Van Gool, L. End-to-end learning of driving models with surround-view cameras and route planners. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 435–453. [Google Scholar]
- Xu, H.; Gao, Y.; Yu, F.; Darrell, T. End-to-end learning of driving models from large-scale video datasets. In Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2174–2182. [Google Scholar]
- Prakash, A.; Chitta, K.; Geiger, A. Multi-modal fusion transformer for end-to-end autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 7077–7087. [Google Scholar]
- Shao, H.; Wang, L.; Chen, R.; Li, H.; Liu, Y. Safety-enhanced autonomous driving using interpretable sensor fusion transformer. In Proceedings of the Conference on Robot Learning, Atlanta, GA, USA, 6–9 November 2023; pp. 726–737. [Google Scholar]
- Codevilla, F.; López, A.M.; Koltun, V.; Dosovitskiy, A. On offline evaluation of vision-based driving models. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 236–251. [Google Scholar]
- Haq, F.U.; Shin, D.; Nejati, S.; Briand, L. Can offline testing of deep neural networks replace their online testing? A case study of automated driving systems. Empir. Softw. Eng. 2021, 26, 90. [Google Scholar] [CrossRef]
- Codevilla, F.; Miiller, M.; López, A.; Koltun, V.; Dosovitskiy, A. End-to-end driving via conditional imitation learning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–9. [Google Scholar]
- Ross, S.; Gordon, G.; Bagnell, D. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, Lauderdale, FL, USA, 11–13 April 2011; pp. 627–635. [Google Scholar]
- Kalaria, D.; Lin, Q.; Dolan, J.M. Delay-Aware Robust Control for Safe Autonomous Driving and Racing. IEEE Trans. Intell. Transp. Syst. 2023, 1–11. [Google Scholar] [CrossRef]
- Liao, Y.; Liao, F. Design of preview controller for linear continuous-time systems with input delay. Int. J. Control. Autom. Syst. 2018, 16, 1080–1090. [Google Scholar] [CrossRef]
- Tampuu, A.; Aidla, R.; van Gent, J.A.; Matiisen, T. Lidar-as-camera for end-to-end driving. Sensors 2023, 23, 2845. [Google Scholar] [CrossRef] [PubMed]
- Donkey Car. Available online: https://www.donkeycar.com/ (accessed on 30 January 2022).
- Tiedemann, T.; Schwalb, L.; Kasten, M.; Grotkasten, R.; Pareigis, S. Miniature autonomy as means to find new approaches in reliable autonomous driving AI method design. Front. Neurorobot. 2022, 16, 846355. [Google Scholar] [CrossRef] [PubMed]
- Argall, B.D.; Chernova, S.; Veloso, M.; Browning, B. A survey of robot learning from demonstration. Robot. Auton. Syst. 2009, 57, 469–483. [Google Scholar] [CrossRef]
- Lee, K.; Lee, K.; Lee, H.; Shin, J. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Adv. Neural Inf. Process. Syst. 2018, 31, 7167–7177. [Google Scholar]
- Hendrycks, D.; Mazeika, M.; Dietterich, T. Deep anomaly detection with outlier exposure. arXiv 2018, arXiv:1812.04606. [Google Scholar]
- Jain, A.; Bansal, R.; Kumar, A.; Singh, K. A comparative study of visual and auditory reaction times on the basis of gender and physical activity levels of medical first year students. Int. J. Appl. Basic Med Res. 2015, 5, 124. [Google Scholar] [CrossRef]
- Khan, F.; Falco, M.; Anwar, H.; Pfahl, D. Safety Testing of Automated Driving Systems: A Literature Review. IEEE Access 2023, 11, 120049–120072. [Google Scholar] [CrossRef]
- Hecker, S.; Dai, D.; Van Gool, L. Learning Accurate, Comfortable and Human-like Driving. arXiv 2019, arXiv:1903.10995. [Google Scholar]
- Elbanhawi, M.; Simic, M.; Jazar, R. In the passenger seat: Investigating ride comfort measures in autonomous cars. IEEE Intell. Transp. Syst. Mag. 2015, 7, 4–17. [Google Scholar] [CrossRef]
Layer | Hyperparameters | Dropout | Activation |
---|---|---|---|
Input | shape (height, 160, 3) | none | none |
Conv2d | filters = 24, size = 5, stride = 2 | 0.2 | ReLU |
Conv2d | filters = 32, size = 5, stride = 2 | 0.2 | ReLU |
Conv2d | filters = 64, size = 5, stride = 2 | 0.2 | ReLU |
Conv2d | filters = 64, size = 3, stride = 1 | 0.2 | ReLU |
Conv2d | filters = 64, size = 3, stride = 1 | 0.2 | ReLU |
Flatten | - | - | - |
Linear | nodes = 100 | 0.2 | ReLU |
Linear | nodes = 50 | 0.2 | ReLU |
Linear | nodes = 1 | none | none |
Layer | Hyperparameters | Dropout | Activation |
---|---|---|---|
Input | size= (3, 60, 160, 3) | - | - |
Conv3d | filters = 16, size = (3, 3, 3), stride = (1, 3, 3) | - | ReLU |
MaxPool3D | pool_size = (1, 2, 2), stride = (1, 2, 2) | - | - |
Conv3d | filters = 32, size = (1, 3, 3), stride = (1, 1, 1) | - | ReLU |
MaxPool3D | pool_size = (1, 2, 2), stride = (1, 2, 2) | - | - |
Conv3d | filters = 32, size = (1, 3, 3), stride = (1, 1, 1) | - | ReLU |
MaxPool3D | pool_size = (1, 2, 2), stride = (1, 2, 2) | - | - |
Flatten | - | - | - |
Linear | nodes = 128, batch normalization | 0.2 | ReLU |
Linear | nodes = 256, batch normalization | 0.2 | ReLU |
Linear | nodes = 1 | - | none |
Model Type | Validation Data Speed | Mean Absolute Error |
---|---|---|
slow single frame | slow | 0.0232 |
slow single frame | fast | 0.0473 |
fast single frame | fast | 0.0237 |
fast single frame | slow | 0.0266 |
on average: | known | 0.0235 |
novel | 0.0367 | |
slow multi- frame | slow | 0.0888 |
slow multi-frame | fast | 0.1298 |
fast multi-frame | fast | 0.0612 |
fast multi-frame | slow | 0.0614 |
on average: | known | 0.0754 |
novel | 0.0947 |
Model Type | Deployment Speed | Infractions |
---|---|---|
slow single frame | slow | 0 |
slow single frame | fast | 10 |
fast single frame | fast | 2 |
fast single frame | slow | 16 |
slow multi- frame | slow | 0 |
slow multi-frame | fast | 20 |
fast multi-frame | fast | 8 |
fast multi-frame | slow | 19 |
Validation Data | |||
---|---|---|---|
Training Data | Metric | Same Speed | Novel Speed |
Slow | Euclidean | 0.81 | 1.82 |
Slow | Cosine | 0.004 | 0.019 |
Fast | Euclidean | 0.81 | 1.21 |
Fast | Cosine | 0.006 | 0.013 |
Compute Time | Used Label | ||||||
---|---|---|---|---|---|---|---|
−100 ms | −50 ms | No Shift | 50 ms | 100 ms | 150 ms | 200 ms | |
24 ms | 10.3 | 9.3 | 8.5 | 7.4 | ∞ | ∞ | ∞ |
49 ms | 11.9 | 9.7 | 8.6 | 7.8 | 8 | ∞ | ∞ |
74 ms | 13.2 | 11.8 | 10.1 | 9.1 | 8.5 | ∞ | ∞ |
99 ms | 16.2 | 13.8 | 11.5 | 10.5 | 9.4 | 8.1 | ∞ |
124 ms | 17.5 | 13.7 | 12.5 | 11.3 | 10.6 | 9.1 | ∞ |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tampuu, A.; Roosild, K.; Uduste, I. The Effects of Speed and Delays on Test-Time Performance of End-to-End Self-Driving. Sensors 2024, 24, 1963. https://doi.org/10.3390/s24061963
Tampuu A, Roosild K, Uduste I. The Effects of Speed and Delays on Test-Time Performance of End-to-End Self-Driving. Sensors. 2024; 24(6):1963. https://doi.org/10.3390/s24061963
Chicago/Turabian StyleTampuu, Ardi, Kristjan Roosild, and Ilmar Uduste. 2024. "The Effects of Speed and Delays on Test-Time Performance of End-to-End Self-Driving" Sensors 24, no. 6: 1963. https://doi.org/10.3390/s24061963
APA StyleTampuu, A., Roosild, K., & Uduste, I. (2024). The Effects of Speed and Delays on Test-Time Performance of End-to-End Self-Driving. Sensors, 24(6), 1963. https://doi.org/10.3390/s24061963