Enhancing Model Agnostic Meta-Learning via Gradient Similarity Loss
Abstract
:1. Introduction
2. Related Works
2.1. Model Agnostic Meta-Learning
Algorithm 1 Model Agnostic Meta-Learning [8] |
|
2.2. Optimization-Based First-Order Meta-Learning
2.3. Cosine Similarity Loss
3. Preliminaries
3.1. Second-Order Computation from MAML
3.2. Hessian-Vector Product
3.3. Setting the Second-Order Term to Zero Is Effective
4. Method
4.1. Approximate Hessian Effect
4.2. How to Update Variables via Gradient Similarity Loss
4.2.1. Gradient Similarity Loss
4.2.2. MAML via AHE Update
Algorithm 2 MAML via AHE update |
|
4.3. Comparative Analysis of Proposed Algorithm
5. Experiments
5.1. Implementation Detail
5.1.1. Dataset
5.1.2. Experiment Setting
5.1.3. Evaluation Setup
5.1.4. Comparison Methods
5.2. Experimental Results
5.2.1. 5-Way 1-Shot Classification
5.2.2. 5-Way 5-Shot Classification
5.2.3. 20-Way 1-Shot Classification
5.2.4. 20-Way 5-Shot Classification
5.3. Additional Experiment Details
5.3.1. Adjustment of AHE Gradient Magnitude
5.3.2. Methods for Scheduling
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Vinyals, O.; Blundell, C.; Lillicrap, T. Matching networks for one-shot learning. Adv. Neural Inf. Process. Syst. 2016, 29, 3630–3638. [Google Scholar]
- Hospedales, T.M.; Antoniou, A.; Micaelli, P.; Storkey, A. Meta-Learning in Neural Networks: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 5149–5169. [Google Scholar] [CrossRef] [PubMed]
- Huisman, M.; van Rijn, J.N.; Plaat, A. A survey of deep meta-learning. Artif. Intell. Rev. 2021, 54, 4483–4541. [Google Scholar] [CrossRef]
- Achille, A.; Lam, M.; Tewari, R.; Ravichandran, A.; Maji, S.; Fowlkes, C.C.; Soatto, S.; Perona, P. Task2Vec: Task Embedding for Meta-Learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Wu, Z.; Shi, X.; Lin, G.; Cai, J. Learning Meta-class Memory for Few-Shot Semantic Segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021. [Google Scholar]
- Santoro, A.; Bartunov, S.; Botvinick, M.; Wierstra, D.; Lillicrap, T. Meta-learning with memory-augmented neural networks. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016. [Google Scholar]
- Lee, K.; Maji, S.; Ravichandran, A.; Soatto, S. Meta-Learning With Differentiable Convex Optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
- Ravi, S.; Larochelle, H. Optimization as a model for few-shot learning. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Yuan, Y.; Zheng, G.; Wong, K.; Ottersten, B.; Luo, Z. Transfer Learning and Meta Learning-Based Fast Downlink Beamforming Adaptation. IEEE Trans. Wirel. Commun. 2020, 20, 1742–1755. [Google Scholar] [CrossRef]
- Khadka, R.; Jha, D.; Hicks, S.; Thambawita, V.; Riegler, M.A.; Ali, S.; Halvorsen, P. Meta-learning with implicit gradients in a few-shot setting for medical image segmentation. Comput. Biol. Med. 2022, 143, 105227. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv 2019, arXiv:1810.04805. [Google Scholar]
- Gu, J.; Wang, Y.; Chen, Y.; Cho, K.; Li, V. Meta-Learning for Low-Resource Neural Machine Translation. arXiv 2018, arXiv:1808.08437. [Google Scholar]
- Li, B.; Gan, Z.; Chen, D.; Aleksandrovich, D.S. UAV Maneuvering Target Tracking in Uncertain Environments Based on Deep Reinforcement Learning and Meta-Learning. Remote Sens. 2020, 12, 3789. [Google Scholar] [CrossRef]
- Zhang, C.; Bengio, S.; Hardt, M.; Recht, B.; Vinyals, O. Understanding deep learning requires rethinking generalization. Commun. ACM 2021, 64, 107–115. [Google Scholar] [CrossRef]
- Li, Z.; Zhou, F.; Chen, F.; Li, H. Meta-SGD: Learning to learn quickly for few-shot learning. arXiv 2017, arXiv:1707.09835. [Google Scholar]
- Nichol, A.; Achiam, J.; Schulman, J. On first-order meta-learning algorithms. arXiv 2018, arXiv:1803.02999. [Google Scholar]
- Triantafillou, E.; Zemel, R.; Urtasun, R. Few-shot learning through an information retrieval lens. Adv. Neural Inf. Process. Syst. 2017, 30, 2255–2265. [Google Scholar]
- Singh, R.; Bharti, V.; Purohit, V.; Kumar, A.; Singh, A.K.; Singh, S.K. MetaMed: Few-shot medical image classification using gradient-based meta-learning. Pattern Recognit. 2021, 120, 108111. [Google Scholar] [CrossRef]
- Rajeswaran, A.; Finn, C.; Kakade, S.M.; Levine, S. Meta-learning with implicit gradients. Adv. Neural Inf. Process. Syst. 2019, 32, 113–124. [Google Scholar]
- Zhou, P.; Yuan, X.; Xu, H.; Yan, S.; Feng, J. Efficient meta learning via minibatch proximal update. Adv. Neural Inf. Process. Syst. 2019, 32, 1534–1544. [Google Scholar]
- Kedia, A.; Chinthakindi, S.C.; Ryu, W. Beyond Reptile: Meta-Learned Dot-Product Maximization between Gradients for Improved Single-Task Regularization. In Findings of the Association for Computational Linguistics: EMNLP 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021. [Google Scholar]
- Bai, Y.; Chen, M.; Zhou, P.; Zhao, T.; Lee, J.; Kakade, S.; Wang, H.; Xiong, C. How important is the train-validation split in meta-learning? In Proceedings of the International Conference on Machine Learning, Online, 18–24 July 2021; pp. 543–553. [Google Scholar]
- Fan, C.; Ram, P.; Liu, S. Sign-maml: Efficient model-agnostic meta-learning by signsgd. arXiv 2021, arXiv:2109.07497. [Google Scholar]
- Falato, M.J.; Wolfe, B.; Natan, T.M.; Zhang, X.; Marshall, R.; Zhou, Y.; Bellan, P.; Wang, Z. Plasma image classification using cosine similarity constrained convolutional neural network. J. Plasma Phys. 2022, 88, 895880603. [Google Scholar] [CrossRef]
- Tao, Z.; Huang, S.; Wang, G. Prototypes Sampling Mechanism for Class Incremental Learning. IEEE Access 2023, 11, 81942–81952. [Google Scholar] [CrossRef]
- Griewank, A. Some bounds on the complexity of gradients, Jacobians, and Hessians. In Complexity in Numerical Optimization; World Scientific: Singapore, 1993; pp. 128–162. [Google Scholar]
- Rusu, A.A.; Rao, D.; Sygnowski, J.; Vinyals, O.; Pascanu, R.; Osindero, S.; Hadsell, R. Meta-learning with latent embedding optimization. arXiv 2018, arXiv:1807.05960. [Google Scholar]
- Munkhdalai, T.; Yu, H. Meta networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; Volume 70, pp. 2554–2563. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Arnold, S.M.; Mahajan, P.; Datta, D.; Bunner, I.; Zarkias, K.S. learn2learn: A library for Meta-Learning research. arXiv 2020, arXiv:2008.12284. [Google Scholar]
- Fallah, A.; Mokhtari, A.; Ozdaglar, A. Personalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach. Adv. Neural Inf. Process. Syst. 2020, 33, 3557–3568. [Google Scholar]
- Finn, C.; Rajeswaran, A.; Kakade, S.; Levine, S. Online meta-learning. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 1920–1930. [Google Scholar]
Task distribution | |
Learning rate of inner, outer loop | |
Inner loop update iteration number | |
Meta-batch size of outer loop for | |
The gradient of inner loop loss from | |
The variables approximate from and | |
Defined step size of | |
The interpolation rate in (0,1] | |
Learning rate of |
Computational Complexity | Space Complexity | |
---|---|---|
FOMAML | ||
MAML | ||
AHE (Ours) |
5-Way 1-Shot | 5-Way 5-Shot | 20-Way 1-Shot | 20-Way 5-Shot | |
---|---|---|---|---|
MAML | ||||
FOMAML | ||||
AHE (Ours) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tak, J.-H.; Hong, B.-W. Enhancing Model Agnostic Meta-Learning via Gradient Similarity Loss. Electronics 2024, 13, 535. https://doi.org/10.3390/electronics13030535
Tak J-H, Hong B-W. Enhancing Model Agnostic Meta-Learning via Gradient Similarity Loss. Electronics. 2024; 13(3):535. https://doi.org/10.3390/electronics13030535
Chicago/Turabian StyleTak, Jae-Ho, and Byung-Woo Hong. 2024. "Enhancing Model Agnostic Meta-Learning via Gradient Similarity Loss" Electronics 13, no. 3: 535. https://doi.org/10.3390/electronics13030535