Visual Pretraining via Contrastive Predictive Model for Pixel-Based Reinforcement Learning
Abstract
:1. Introduction
2. Related Work
3. Background
3.1. Reinforcement Learning from Images
3.2. Soft Actor-Critic
4. Method
4.1. Network Architecture
4.2. Visual Pretraining via Contrastive Predictive Model
Algorithm 1 Visual pretraining via contrastive predictive model (VPCPM) |
|
5. Experimental Results
5.1. Experiment Setup
5.2. Effects of Pretrained Representation
5.3. Comparison with Prior Methods
5.4. Effects of Components during Pretraining
5.5. Generalization over Unseen Tasks
5.6. Pretraining with Classification
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Jaderberg, M.; Mnih, V.; Czarnecki, W.M.; Schaul, T.; Leibo, J.Z.; Silver, D.; Kavukcuoglu, K. Reinforcement learning with unsupervised auxiliary tasks. In Proceedings of the 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017. [Google Scholar]
- Espeholt, L.; Soyer, H.; Munos, R.; Simonyan, K.; Mnih, V.; Ward, T.; Doron, Y.; Firoiu, V.; Harley, T.; Dunning, I.; et al. Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholm, Sweden, 10–15 July 2018; pp. 1407–1416. [Google Scholar]
- Levine, S.; Finn, C.; Darrell, T.; Abbeel, P. End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 2016, 17, 1334–1373. [Google Scholar]
- Lee, A.X.; Nagabandi, A.; Abbeel, P.; Levine, S. Stochastic latent actor-critic: Deep reinforcement learning with a latent variable model. In Proceedings of the 34th Advances in Neural Information Processing Systems, NeurIPS 2020, Online, 6–12 December 2020; pp. 741–752. [Google Scholar]
- Kalashnikov, D.; Irpan, A.; Pastor, P.; Ibarz, J.; Herzog, A.; Jang, E.; Quillen, D.; Holly, E.; Kalakrishnan, M.; Vanhoucke, V.; et al. Scalable deep reinforcement learning for vision-based robotic manipulation. In Proceedings of the 2nd Conference on Robot Learning, CoRL 2018, Zürich, Switzerland, 29–31 October 2018; pp. 651–673. [Google Scholar]
- Akkaya, I.; Andrychowicz, M.; Chociej, M.; Litwin, M.; McGrew, B.; Petron, A.; Paino, A.; Plappert, M.; Powell, G.; Ribas, R.; et al. Solving rubik’s cube with a robot hand. arXiv 2019, arXiv:1910.07113. [Google Scholar]
- Julian, R.; Swanson, B.; Sukhatme, G.S.; Levine, S.; Finn, C.; Hausman, K. Never Stop Learning: The Effectiveness of Fine-Tuning in Robotic Reinforcement Learning. In Proceedings of the 4th Conference on Robot Learning, CoRL 2020, Online, 16–18 November 2020. [Google Scholar]
- Liu, H.; Abbeel, P. Behavior from the void: Unsupervised active pre-training. In Proceedings of the 35th Advances in Neural Information Processing Systems, NeurIPS 2021, Online, 6–14 December 2021; pp. 18459–18473. [Google Scholar]
- Shah, R.; Kumar, V. Rrl: Resnet as representation for reinforcement learning. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, Online, 18–24 July 2021. [Google Scholar]
- Lange, S.; Riedmiller, M. Deep auto-encoder neural networks in reinforcement learning. In Proceedings of the International Joint Conference on Neural Networks, IJCNN 2010, Barcelona, Spain, 18–23 July 2010; pp. 1–8. [Google Scholar]
- Finn, C.; Tan, X.Y.; Duan, Y.; Darrell, T.; Levine, S.; Abbeel, P. Deep spatial autoencoders for visuomotor learning. In Proceedings of the International Conference on Robotics and Automation, ICRA 2016, Stockholm, Sweden, 16–21 May 2016. [Google Scholar]
- Nair, A.V.; Pong, V.; Dalal, M.; Bahl, S.; Lin, S.; Levine, S. Visual reinforcement learning with imagined goals. In Proceedings of the 32th Advances in Neural Information Processing Systems, NeurIPS 2018, Montreal, QC, Canada, 3–8 December 2018. [Google Scholar]
- Srinivas, A.; Laskin, M.; Abbeel, P. Curl: Contrastive unsupervised representations for reinforcement learning. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, Online, 12–18 July 2020. [Google Scholar]
- Stooke, A.; Lee, K.; Abbeel, P.; Laskin, M. Decoupling representation learning from reinforcement learning. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, Online, 18–24 July 2021; pp. 9870–9879. [Google Scholar]
- Tunyasuvunakool, S.; Muldal, A.; Doron, Y.; Liu, S.; Bohez, S.; Merel, J.; Erez, T.; Lillicrap, T.; Heess, N.; Tassa, Y. dm_control: Software and tasks for continuous control. Softw. Impacts 2020, 6, 100022. [Google Scholar] [CrossRef]
- Laskin, M.; Lee, K.; Stooke, A.; Pinto, L.; Abbeel, P.; Srinivas, A. Reinforcement Learning with Augmented Data. In Proceedings of the 34th Advances in Neural Information Processing Systems, NeurIPS 2020, Online, 6–12 December 2020; pp. 19884–19895. [Google Scholar]
- Yarats, D.; Kostrikov, I.; Fergus, R. Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels. In Proceedings of the 9th International Conference on Learning Representations, ICLR 2021, Online, 3–7 May 2021. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the 26th Advances in Neural Information Processing Systems, NeurIPS 2012, Lake Tahoe, NV, USA, 3–8 December 2012. [Google Scholar]
- Wojke, N.; Bewley, A.; Paulus, D. Simple online and realtime tracking with a deep association metric. In Proceedings of the 24th IEEE International Conference on Image Processing, ICIP 2017, Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar]
- Wojke, N.; Bewley, A. Deep cosine metric learning for person re-identification. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, WACV 2018, Lake Tahoe, NV, USA, 12–15 March 2018; pp. 748–756. [Google Scholar]
- Peng, Y.; Rysanek, A.; Nagy, Z.; Schlüter, A. Using machine learning techniques for occupancy-prediction-based cooling control in office buildings. Appl. Energy 2018, 211, 1343–1358. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Vu, T.; Jang, H.; Pham, T.X.; Yoo, C. Cascade rpn: Delving into high-quality region proposal network with adaptive convolution. In Proceedings of the 33rd Advances in Neural Information Processing Systems, NeurIPS 2019, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Vu, T.; Kang, H.; Yoo, C.D. Scnet: Training inference sample consistency for instance segmentation. In Proceedings of the 35th Association for the Advancement of Artificial Intelligence, AAAI 2021, Online, 2–9 February 2021; pp. 2701–2709. [Google Scholar]
- Jiang, L.; Zhao, H.; Shi, S.; Liu, S.; Fu, C.W.; Jia, J. Pointgroup: Dual-set point grouping for 3d instance segmentation. In Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition, CVPR 2020, Online, 14–19 June 2020; pp. 4867–4876. [Google Scholar]
- Chen, S.; Fang, J.; Zhang, Q.; Liu, W.; Wang, X. Hierarchical aggregation for 3d instance segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2021, Online, 11–17 October 2021; pp. 15467–15476. [Google Scholar]
- Vu, T.; Kim, K.; Luu, T.M.; Nguyen, X.T.; Yoo, C.D. SoftGroup for 3D Instance Segmentation on 3D Point Clouds. In Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, 19–24 June 2022. [Google Scholar]
- Choy, C.B.; Xu, D.; Gwak, J.; Chen, K.; Savarese, S. 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In Proceedings of the 14th European Conference on Computer Vision, ECCV 2016, Amsterdam, The Netherlands, 8–16 October 2016; pp. 628–644. [Google Scholar]
- Rosinol, A.; Sattler, T.; Pollefeys, M.; Carlone, L. Incremental visual-inertial 3d mesh generation with structural regularities. In Proceedings of the IEEE International Conference on Robotics and Automation, ICRA 2019, Montreal, QC, Canada, 20–24 May 2019; pp. 8220–8226. [Google Scholar]
- Chen, M.; Tang, Y.; Zou, X.; Huang, K.; Li, L.; He, Y. High-accuracy multi-camera reconstruction enhanced by adaptive point cloud correction algorithm. Opt. Lasers Eng. 2019, 122, 170–183. [Google Scholar] [CrossRef]
- Zhang, F.; Leitner, J.; Milford, M.; Upcroft, B.; Corke, P. Towards vision-based deep reinforcement learning for robotic motion control. arXiv 2015, arXiv:1511.03791. [Google Scholar]
- Ebert, F.; Finn, C.; Dasari, S.; Xie, A.; Lee, A.; Levine, S. Visual foresight: Model-based deep reinforcement learning for vision-based robotic control. arXiv 2018, arXiv:1812.00568. [Google Scholar]
- Beattie, C.; Leibo, J.Z.; Teplyashin, D.; Ward, T.; Wainwright, M.; Küttler, H.; Lefrancq, A.; Green, S.; Valdés, V.; Sadik, A.; et al. Deepmind lab. arXiv 2016, arXiv:1612.03801. [Google Scholar]
- Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Yarats, D.; Zhang, A.; Kostrikov, I.; Amos, B.; Pineau, J.; Fergus, R. Improving sample efficiency in model-free reinforcement learning from images. In Proceedings of the 35th Association for the Advancement of Artificial Intelligence, AAAI 2021, Online, 2–9 February 2021; pp. 10674–10681. [Google Scholar]
- Agrawal, P.; Nair, A.V.; Abbeel, P.; Malik, J.; Levine, S. Learning to poke by poking: Experiential learning of intuitive physics. In Proceedings of the 30th Advances in Neural Information Processing Systems, NeurIPS 2016, Barcelona, Spain, 5–10 December 2016; pp. 5092–5100. [Google Scholar]
- Pathak, D.; Agrawal, P.; Efros, A.A.; Darrell, T. Curiosity-driven exploration by self-supervised prediction. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, Australia, 6–11 August 2017; pp. 2778–2787. [Google Scholar]
- Pathak, D.; Mahmoudieh, P.; Luo, G.; Agrawal, P.; Chen, D.; Shentu, Y.; Shelhamer, E.; Malik, J.; Efros, A.A.; Darrell, T. Zero-shot visual imitation. In Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition Workshop, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2050–2053. [Google Scholar]
- Böhmer, W.; Springenberg, J.T.; Boedecker, J.; Riedmiller, M.; Obermayer, K. Autonomous learning of state representations for control: An emerging field aims to autonomously learn state representations for reinforcement learning agents from their real-world sensor observations. Künstl. Intell. 2015, 29, 353–362. [Google Scholar] [CrossRef]
- Lesort, T.; Díaz-Rodríguez, N.; Goudou, J.F.; Filliat, D. State representation learning for control: An overview. Neural Netw. 2018, 108, 379–392. [Google Scholar] [CrossRef] [PubMed]
- Oord, A.v.d.; Li, Y.; Vinyals, O. Representation learning with contrastive predictive coding. arXiv 2018, arXiv:1807.03748. [Google Scholar]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the TEEE/CVF Computer Vision and Pattern Recognition, CVPR 2020, Online, 14–19 June 2020. [Google Scholar]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, Online, 12–18 July 2020. [Google Scholar]
- Schwarzer, M.; Anand, A.; Goel, R.; Hjelm, R.D.; Courville, A.; Bachman, P. Data-efficient reinforcement learning with self-predictive representations. In Proceedings of the 9th International Conference on Learning Representations, ICLR 2021, Online, 3–7 May 2021. [Google Scholar]
- Lee, K.H.; Fischer, I.; Liu, A.; Guo, Y.; Lee, H.; Canny, J.; Guadarrama, S. Predictive information accelerates learning in rl. In Proceedings of the 34th Advances in Neural Information Processing Systems, NeurIPS 2020, Online, 6–12 December 2020; pp. 11890–11901. [Google Scholar]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In Proceedings of the 33rd International Conference on Machine Learning, ICML 2016, New York, NY, USA, 19–24 June 2016; pp. 1928–1937. [Google Scholar]
- Bellemare, M.G.; Naddaf, Y.; Veness, J.; Bowling, M. The arcade learning environment: An evaluation platform for general agents. J. Artif. Intell. Res. 2013, 47, 253–279. [Google Scholar] [CrossRef]
- Anand, A.; Racah, E.; Ozair, S.; Bengio, Y.; Côté, M.A.; Hjelm, R.D. Unsupervised state representation learning in atari. In Proceedings of the 33rd Advances in Neural Information Processing Systems, NeurIPS 2019, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Grill, J.B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.H.; Buchatskaya, E.; Doersch, C.; Pires, B.A.; Guo, Z.D.; Azar, M.G.; et al. Bootstrap your own latent: A new approach to self-supervised learning. In Proceedings of the 34th Advances in Neural Information Processing Systems, NeurIPS 2020, Online, 6–12 December 2020; pp. 21271–21284. [Google Scholar]
- Fischer, I. The conditional entropy bottleneck. Entropy 2020, 22, 999. [Google Scholar] [CrossRef] [PubMed]
- Zhang, A.; McAllister, R.; Calandra, R.; Gal, Y.; Levine, S. Learning invariant representations for reinforcement learning without reconstruction. In Proceedings of the 9th International Conference on Learning Representations, ICLR 2021, Online, 3–7 May 2021. [Google Scholar]
- Agarwal, R.; Machado, M.C.; Castro, P.S.; Bellemare, M.G. Contrastive behavioral similarity embeddings for generalization in reinforcement learning. In Proceedings of the 9th International Conference on Learning Representations, ICLR 2021, Online, 3–7 May 2021. [Google Scholar]
- Ferns, N.; Precup, D. Bisimulation Metrics are Optimal Value Functions. In Proceedings of the 30th Association for Uncertainty in Artificial Intelligence, UAI 2014, Quebec City, QC, Canada, 23–27 July 2014. [Google Scholar]
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. 2018. Available online: https://openai.com/blog/language-unsupervised (accessed on 15 January 2020).
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019. Available online: https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf (accessed on 15 January 2020).
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 17th North American Chapter of the Association for Computational Linguistics, NAACL 2019, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
- Zbontar, J.; Jing, L.; Misra, I.; LeCun, Y.; Deny, S. Barlow twins: Self-supervised learning via redundancy reduction. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, Online, 18–24 July 2021. [Google Scholar]
- Bardes, A.; Ponce, J.; LeCun, Y. Vicreg: Variance-invariance-covariance regularization for self-supervised learning. In Proceedings of the 10th International Conference on Learning Representations, ICLR 2022, Online, 25–29 April 2022. [Google Scholar]
- Devin, C.; Abbeel, P.; Darrell, T.; Levine, S. Deep object-centric representations for generalizable robot learning. In Proceedings of the IEEE International Conference on Robotics and Automation, ICRA 2018, Brisbane, Australia, 21–25 May 2018. [Google Scholar]
- Pathak, D.; Gandhi, D.; Gupta, A. Self-supervised exploration via disagreement. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, CA, USA, 10–15 June 2019; pp. 5062–5071. [Google Scholar]
- Burda, Y.; Edwards, H.; Storkey, A.; Klimov, O. Exploration by random network distillation. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Aubret, A.; Matignon, L.; Hassas, S. A survey on intrinsic motivation in reinforcement learning. arXiv 2019, arXiv:1908.06976. [Google Scholar]
- Nguyen, T.; Luu, T.M.; Vu, T.; Yoo, C.D. Sample-efficient reinforcement learning representation learning with curiosity contrastive forward dynamics model. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2021, Online, 27 September–1 October 2021; pp. 3471–3477. [Google Scholar]
- Laskin, M.; Yarats, D.; Liu, H.; Lee, K.; Zhan, A.; Lu, K.; Cang, C.; Pinto, L.; Abbeel, P. URLB: Unsupervised reinforcement learning benchmark. In Proceedings of the 35th Advances in Neural Information Processing Systems, NeurIPS 2021, Online, 6–14 December 2021. [Google Scholar]
- Yarats, D.; Fergus, R.; Lazaric, A.; Pinto, L. Reinforcement learning with prototypical representations. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, Online, 18–24 July 2021; pp. 11920–11931. [Google Scholar]
- Lee, L.; Eysenbach, B.; Parisotto, E.; Xing, E.; Levine, S.; Salakhutdinov, R. Efficient exploration via state marginal matching. arXiv 2019, arXiv:1906.05274. [Google Scholar]
- Eysenbach, B.; Gupta, A.; Ibarz, J.; Levine, S. Diversity is all you need: Learning skills without a reward function. In Proceedings of the 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Hansen, S.; Dabney, W.; Barreto, A.; Van de Wiele, T.; Warde-Farley, D.; Mnih, V. Fast task inference with variational intrinsic successor features. In Proceedings of the 8th International Conference on Learning Representations, ICLR 2020, Online, 26 Apri–1 May 2020. [Google Scholar]
- Liu, H.; Abbeel, P. Aps: Active pretraining with successor features. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, Online, 18–24 July 2021; pp. 6736–6747. [Google Scholar]
- Singh, H.; Misra, N.; Hnizdo, V.; Fedorowicz, A.; Demchuk, E. Nearest neighbor estimates of entropy. Am. J. Math. Manag. Sci. 2003, 23, 301–321. [Google Scholar] [CrossRef]
- Caron, M.; Misra, I.; Mairal, J.; Goyal, P.; Bojanowski, P.; Joulin, A. Unsupervised learning of visual features by contrasting cluster assignments. In Proceedings of the 34th Advances in Neural Information Processing Systems, NeurIPS 2020, Online, 6–12 December 2020; pp. 9912–9924. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
- Kaelbling, L.P.; Littman, M.L.; Cassandra, A.R. Planning and acting in partially observable stochastic domains. Artif. Intell. 1998, 101, 99–134. [Google Scholar] [CrossRef] [Green Version]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholm, Sweden, 10–15 July 2018; pp. 1861–1870. [Google Scholar]
- Ziebart, B.D. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy; Carnegie Mellon University: Pittsburgh, PA, USA, 2010; 216p. [Google Scholar]
- Hafner, D.; Lillicrap, T.; Fischer, I.; Villegas, R.; Ha, D.; Lee, H.; Davidson, J. Learning latent dynamics for planning from pixels. In Proceedings of the International Conference on Machine Learning, ICML 2019, Long Beach, CA, USA, 10–15 June 2019; pp. 2555–2565. [Google Scholar]
- Hafner, D.; Lillicrap, T.; Ba, J.; Norouzi, M. Dream to Control: Learning Behaviors by Latent Imagination. In Proceedings of the 8th International Conference on Learning Representations, ICLR 2020, Online, 26 April–1 May 2020. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 5th International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16 April 2014; pp. 1–15. [Google Scholar]
- Shu, R.; Nguyen, T.; Chow, Y.; Pham, T.; Than, K.; Ghavamzadeh, M.; Ermon, S.; Bui, H. Predictive coding for locally-linear control. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, Online, 12–18 July 2020. [Google Scholar]
Parameter | RAD [17] | DrQ [18] |
---|---|---|
Action repeat | 2 Finger, spin; Walker, walk | |
8 Cartpole, swingup | ||
4 otherwise | ||
Batch size | 512 | 512 |
Augmentation | Crop [17] | Crop [18] |
Replay buffer size | 105 | 105 |
Initial steps | 1000 | 1000 |
Stacked frames | 3 | 3 |
Discount γ | 0.99 | 0.99 |
Optimizer | Adam | Adam |
Learning rate (πe, πa, Q) | 2 × 10−4 cheetah | 1× 10−3 (all) |
1 × 10−3 otherwise | ||
Learning rate (α) | 1 × 10−4 | 1 × 10−4 |
Critic target update frequency | 2 | 2 |
Critic soft-update rate τ | 0.01 | 0.01 |
Actor update frequency | 2 | 2 |
Actor log stddev bounds | [−10, 2] | [−10, 2] |
Initial temperature | 0.1 | 0.1 |
RAD [17] | DrQ [18] | |||
---|---|---|---|---|
100k Steps | Scratch | +VPCPM | Scratch | +VPCPM |
Finger, spin | 860 ± 29 | 974 ± 15 | 901 ± 104 | 958 ± 20 |
Cartpole, swingup | 454 ± 155 | 798 ± 42 | 759 ± 92 | 825 ± 40 |
Reacher, easy | 704 ± 212 | 855 ± 117 | 601 ± 213 | 720 ± 65 |
Cheetah, run | 365 ± 31 | 420 ± 81 | 344 ± 67 | 384 ± 45 |
Walker, walk | 493 ± 175 | 614 ± 128 | 612 ± 164 | 651 ± 115 |
Ball in cup, catch | 421 ± 247 | 918 ± 25 | 913 ± 53 | 959 ± 7 |
500k steps | ||||
Finger, spin | 982 ± 2 | 985 ± 15 | 938 ± 103 | 988 ± 5 |
Cartpole, swingup | 867 ± 10 | 870 ± 13 | 868 ± 10 | 877 ± 11 |
Reacher, easy | 945 ± 34 | 965 ± 41 | 942 ± 71 | 945 ± 43 |
Cheetah, run | 573 ± 35 | 670 ± 7 | 660 ± 96 | 698 ± 47 |
Walker, walk | 948 ± 11 | 952 ± 15 | 921 ± 45 | 960 ± 10 |
Ball in cup, catch | 962 ± 5 | 964 ± 13 | 963 ± 9 | 968 ± 4 |
Source Task | Target Task |
---|---|
Cartpole-swingup | Cartpole-swingup_sparse |
Cartpole-balance | |
Cartpole-balance_sparse | |
Walker-walk | Walker-stand |
Reacher-easy | Reacher-hard |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Luu, T.M.; Vu, T.; Nguyen, T.; Yoo, C.D. Visual Pretraining via Contrastive Predictive Model for Pixel-Based Reinforcement Learning. Sensors 2022, 22, 6504. https://doi.org/10.3390/s22176504
Luu TM, Vu T, Nguyen T, Yoo CD. Visual Pretraining via Contrastive Predictive Model for Pixel-Based Reinforcement Learning. Sensors. 2022; 22(17):6504. https://doi.org/10.3390/s22176504
Chicago/Turabian StyleLuu, Tung M., Thang Vu, Thanh Nguyen, and Chang D. Yoo. 2022. "Visual Pretraining via Contrastive Predictive Model for Pixel-Based Reinforcement Learning" Sensors 22, no. 17: 6504. https://doi.org/10.3390/s22176504
APA StyleLuu, T. M., Vu, T., Nguyen, T., & Yoo, C. D. (2022). Visual Pretraining via Contrastive Predictive Model for Pixel-Based Reinforcement Learning. Sensors, 22(17), 6504. https://doi.org/10.3390/s22176504