Generalization Enhancement of Visual Reinforcement Learning through Internal States
Abstract
:1. Introduction
2. Related Works and Preliminaries
2.1. Visual Reinforcement Learning
2.2. Generalization
2.3. Domain Randomization
2.4. Data Augmentation
3. Method
3.1. Framework
3.2. Transfer Learning Phase Details
Algorithm 1 Full training process for proposed ISSA |
Inputs: : Teacher’s parametric networks for policy and Q functions, both based on combination of visual observation and internal states. : Student’s parametric networks for policy and Q functions, both solely based on visual observation. : image augmentation method inherited from DrQ-v2. : parametric network for image encoder, training steps for transfer learning and reinforcement learning, mini-batch size, learning rate and target update rate.
|
4. Experiments
4.1. Setups
4.2. Evaluation on Generalization Ability
4.3. Evaluation on Sample Efficiency
4.4. Abalation Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
DR | Domain Randomization |
DA | Data Augmentation |
DMControl-GB | DMControl Generalization Benchmark |
DMC | DeepMind Control Suite |
MDP | Markov Decision Process |
POMDP | Partially Observable Markov Decision Process |
References
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Vinyals, O.; Ewalds, T.; Bartunov, S.; Georgiev, P.; Vezhnevets, A.S.; Yeo, M.; Makhzani, A.; Küttler, H.; Agapiou, J.; Schrittwieser, J.; et al. Starcraft II: A new challenge for reinforcement learning. arXiv 2017, arXiv:1708.04782. [Google Scholar]
- Tobin, J.; Fong, R.; Ray, A.; Schneider, J.; Zaremba, W.; Abbeel, P. Domain randomization for transferring deep neural networks from simulation to the real world. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vancouver, BC, Canada, 24–28 September 2017; pp. 23–30. [Google Scholar]
- Ren, X.; Luo, J.; Solowjow, E.; Ojea, J.A.; Gupta, A.; Tamar, A.; Abbeel, P. Domain randomization for active pose estimation. In Proceedings of the 2019 International Conference on Robotics and Automation, Montreal, QC, Canada, 20–24 May 2019; pp. 7228–7234. [Google Scholar]
- Chaysri, P.; Spatharis, C.; Vlachos, K.; Blekas, K. Design and implementation of a low-cost intelligent unmanned surface vehicle. Sensors 2024, 24, 3254. [Google Scholar] [CrossRef]
- Wen, Y.; Chen, Y.; Guo, X. USV trajectory tracking control based on receding horizon reinforcement learning. Sensors 2024, 24, 2771. [Google Scholar] [CrossRef] [PubMed]
- Al-Hamadani, M.N.; Fadhel, M.A.; Alzubaidi, L.; Harangi, B. Reinforcement learning algorithms and applications in healthcare and robotics: A comprehensive and systematic review. Sensors 2024, 24, 2461. [Google Scholar] [CrossRef] [PubMed]
- Zhu, Y.; Mottaghi, R.; Kolve, E.; Lim, J.J.; Gupta, A.; Fei-Fei, L.; Farhadi, A. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation, Singapore, 29 May–3 June 2017; pp. 3357–3364. [Google Scholar]
- Ejaz, M.M.; Tang, T.B.; Lu, C.K. Vision-based autonomous navigation approach for a tracked robot using deep reinforcement learning. IEEE Sensors J. 2020, 21, 2230–2240. [Google Scholar] [CrossRef]
- Wang, C.; Wang, Y. Safe autonomous driving with latent dynamics and state-wise constraints. Sensors 2024, 24, 3139. [Google Scholar] [CrossRef] [PubMed]
- Zhao, R.; Wang, K.; Che, W.; Li, Y.; Fan, Y.; Gao, F. Adaptive cruise control based on safe deep reinforcement learning. Sensors 2024, 24, 2657. [Google Scholar] [CrossRef]
- Cobbe, K.; Klimov, O.; Hesse, C.; Kim, T.; Schulman, J. Quantifying generalization in reinforcement learning. In Proceedings of the International Conference on Machine Learning (PMLR), Long Beach, CA, USA, 10–15 June 2019; pp. 1282–1289. [Google Scholar]
- Gamrian, S.; Goldberg, Y. Transfer learning for related reinforcement learning tasks via image-to-image translation. In Proceedings of the International Conference on Machine Learning (PMLR), Long Beach, CA, USA, 10–15 June 2019; pp. 2063–2072. [Google Scholar]
- Zhang, C.; Vinyals, O.; Munos, R.; Bengio, S. A study on overfitting in deep reinforcement learning. arXiv 2018, arXiv:1804.06893. [Google Scholar]
- Farebrother, J.; Machado, M.C.; Bowling, M. Generalization and regularization in dqn. arXiv 2018, arXiv:1810.00123. [Google Scholar]
- Mehta, B.; Diaz, M.; Golemo, F.; Pal, C.J.; Paull, L. Active domain randomization. In Proceedings of the Conference on Robot Learning, Virtual, 30 October–1 November 2020; Volume 100, pp. 1162–1176. [Google Scholar]
- Hansen, N.; Wang, X. Generalization in reinforcement learning by soft data augmentation. In Proceedings of the International Conference on Robotics and Automation, Xi’an, China, 30 May–5 June 2021. [Google Scholar]
- Tassa, Y.; Doron, Y.; Muldal, A.; Erez, T.; Li, Y.; Casas, D.d.L.; Budden, D.; Abdolmaleki, A.; Merel, J.; Lefrancq, A.; et al. Deepmind control suite. arXiv 2018, arXiv:1801.00690. [Google Scholar]
- Wang, X.; Lian, L.; Yu, S.X. Unsupervised visual attention and invariance for reinforcement learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6677–6687. [Google Scholar]
- Yu, T.; Quillen, D.; He, Z.; Julian, R.; Hausman, K.; Finn, C.; Levine, S. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Proceedings of the Conference on Robot Learning (PMLR), Virtual, 30 October–1 November 2020; pp. 1094–1100. [Google Scholar]
- Yarats, D.; Fergus, R.; Lazaric, A.; Pinto, L. Mastering visual continuous control: Improved data-augmented reinforcement learning. arXiv 2021, arXiv:2107.09645. [Google Scholar]
- Bellman, R. A Markovian decision process. J. Math. Mech. 1957, 6, 679–684. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
- Du, S.S.; Luo, Y.; Wang, R.; Zhang, H. Provably efficient Q-learning with function approximation via distribution shift error checking oracle. Adv. Neural Inf. Process. Syst. 2019, 32, 1–11. [Google Scholar]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Cubuk, E.D.; Zoph, B.; Mane, D.; Vasudevan, V.; Le, Q.V. Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–19 June 2019; pp. 113–123. [Google Scholar]
- Cubuk, E.D.; Zoph, B.; Shlens, J.; Le, Q.V. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 702–703. [Google Scholar]
- Laskin, M.; Lee, K.; Stooke, A.; Pinto, L.; Abbeel, P.; Srinivas, A. Reinforcement learning with augmented data. Adv. Neural Inf. Process. Syst. 2020, 33, 19884–19895. [Google Scholar]
- Kostrikov, I.; Yarats, D.; Fergus, R. Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. arXiv 2020, arXiv:2004.13649. [Google Scholar]
- Hansen, N.; Su, H.; Wang, X. Stabilizing deep Q-learning with convNets and vision transformers under data augmentation. arXiv 2021, arXiv:2107.00644. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning. PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1861–1870. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Hansen, N.; Jangir, R.; Sun, Y.; Alenyà, G.; Abbeel, P.; Efros, A.A.; Pinto, L.; Wang, X. Self-supervised policy adaptation during deployment. arXiv 2020, arXiv:2007.04309. [Google Scholar]
DMControl-GB (random colors) | SAC [32] | DrQ [30] | SODA [17] | SVEA [31] | DrQ-v2 [21] | ISSA (Student Model) |
---|---|---|---|---|---|---|
walker_walk | 173 ± 23 | 520 ± 91 | 660 ± 59 | 760 ± 145 | 673 ± 43 | 837 ± 105 (+10.1%) |
walker_stand | 371 ± 90 | 770 ± 73 | 930 ± 12 | 942 ± 26 | 861 ± 24 | 978 ± 85 (+3.8%) |
cartpole_swingup | 248 ± 24 | 586 ± 52 | 819 ± 22 | 835 ± 20 | 814 ± 80 | 873 ± 28 (+4.6%) |
ball_in_cup_catch | 172 ± 21 | 365 ± 210 | 949 ± 19 | 961 ± 7 | 469 ± 99 | 997 ± 23 (+3.7%) |
finger_spin | 372 ± 64 | 776 ± 134 | 896 ± 82 | 977 ± 5 | 730 ± 110 | 970 ± 19 |
DMControl-GB (video backgrounds) | SAC [32] | DrQ [30] | SODA [17] | SVEA [31] | DrQ-v2 [21] | ISSA (student model) |
walker_walk | 142 ± 17 | 516 ± 93 | 692 ± 68 | 819 ± 71 | 719 ± 27 | 894 ± 27 (+9.1%) |
walker_stand | 349 ± 103 | 790 ± 76 | 893 ± 12 | 961 ± 8 | 673 ± 19 | 935 ± 23 |
cartpole_swingup | 248 ± 23 | 579 ± 47 | 758 ± 62 | 782 ± 27 | 267 ± 41 | 836 ± 76 (+6.9%) |
ball_in_cup_catch | 151 ± 36 | 365 ± 210 | 875 ± 56 | 871 ± 106 | 469 ± 99 | 967 ± 57 (+10.5%) |
finger_spin | 280 ± 17 | 776 ± 134 | 793 ± 128 | 803 ± 33 | 780 ± 72 | 848 ± 24 (+5.6%) |
Task (Success %) | Setting | SAC [32] | DrQ [30] | PAD [36] | SVEA [31] | ISSA (Student Model) |
---|---|---|---|---|---|---|
train | 98 ± 2 | 100 ± 0 | 84 ± 7 | 87 ± 5 | 100 ± 0 (+0) | |
black | 95 ± 2 | 99 ± 1 | 95 ± 3 | 56 ± 29 | 100 ± 0 (+1) | |
blanket | 28 ± 8 | 28 ± 15 | 54 ± 6 | 35 ± 23 | 91 ± 4 (+37) | |
DrawerOpen | fabric | 2 ± 1 | 0 ± 0 | 20 ± 6 | 25 ± 12 | 85 ± 3 (+60) |
metal | 35 ± 7 | 35 ± 35 | 81 ± 3 | 83 ± 10 | 98 ± 1 (+15) | |
marble | 3 ± 1 | 0 ± 0 | 3 ± 1 | 18 ± 24 | 56 ± 8 (+38) | |
wood | 18 ± 5 | 5 ± 3 | 39 ± 9 | 40 ± 6 | 93 ± 4 (+53) | |
train | 100 ± 0 | 91 ± 7 | 95 ± 3 | 60 ± 23 | 100 ± 0 (+0) | |
black | 75 ± 4 | 65 ± 13 | 64 ± 9 | 55 ± 16 | 100 ± 0 (+25) | |
blanket | 0 ± 0 | 44 ± 23 | 0 ± 0 | 4 ± 6 | 83 ± 8 (+39) | |
DrawerClose | fabric | 0 ± 0 | 80 ± 14 | 0 ± 0 | 3 ± 4 | 87 ± 3 (+7) |
metal | 0 ± 0 | 51 ± 25 | 2 ± 2 | 47 ± 11 | 97 ± 4 (+46) | |
marble | 0 ± 0 | 86 ± 10 | 0 ± 0 | 25 ± 3 | 89 ± 2 (+3) | |
wood | 0 ± 0 | 50 ± 32 | 12 ± 2 | 26 ± 2 | 70 ± 7 (+20) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, H.; Zhu, W.; Zhu, X. Generalization Enhancement of Visual Reinforcement Learning through Internal States. Sensors 2024, 24, 4513. https://doi.org/10.3390/s24144513
Yang H, Zhu W, Zhu X. Generalization Enhancement of Visual Reinforcement Learning through Internal States. Sensors. 2024; 24(14):4513. https://doi.org/10.3390/s24144513
Chicago/Turabian StyleYang, Hanlin, William Zhu, and Xianchao Zhu. 2024. "Generalization Enhancement of Visual Reinforcement Learning through Internal States" Sensors 24, no. 14: 4513. https://doi.org/10.3390/s24144513
APA StyleYang, H., Zhu, W., & Zhu, X. (2024). Generalization Enhancement of Visual Reinforcement Learning through Internal States. Sensors, 24(14), 4513. https://doi.org/10.3390/s24144513