An Improved Reacceleration Optimization Algorithm Based on the Momentum Method for Image Recognition
Abstract
:1. Introduction
- We propose an improved momentum reacceleration gradient descent algorithm (MRGD) based on the momentum method and verify its performance using multiple image classification datasets through experiments. It is proven that the MRGD algorithm proposed in this paper achieves higher accuracy than the traditional momentum method and Adam algorithm.
- In terms of the sparsity problem, we combine the MRGD algorithm with the Adam algorithm to propose the MRGDAdam algorithm. The experimental results demonstrate that its convergence speed is faster than that of Adam, and its precision performance is also higher than that of Adam. Furthermore, the experimental results also indicate the universality of the proposed method.
- We analyze the fact that the descent rate of the stochastic gradient descent algorithm can be influenced by the relationship between gradient and momentum. The algorithm is verified in the actual task of image classification. The experimental results prove the potential application value of MRGD in the field of optimization algorithms and practical tasks and provide a more effective optimization choice for deep learning model training.
- The algorithm proposed in this paper provides a new idea for the future study of optimization algorithms, and its outstanding training efficiency provides assistance for those studying the practical applications of deep learning.
- The paper is organized as follows. Section 2 introduces the development of the current stochastic gradient descent algorithm, including the constant learning rate stochastic gradient descent algorithms and adaptive learning rate optimization algorithms. Section 3 provides an introduction to the basic principles of the SGD algorithm, and discusses the analysis of gradient descent speed issues, proposing the MRGD algorithm and MRGDAdam algorithm. Section 4 describes our experiments and results analysis demonstrating the effectiveness of the proposed methods. Finally, a conclusion is drawn to summarize the article.
2. Related Work
2.1. Constant Learning Rate Gradient Descent Algorithm
2.2. Adaptive Learning Rate Algorithm
3. Methods
3.1. SGD Method
3.1.1. Momentum Method
3.1.2. Adam Algorithm
Algorithm 1 Adam |
do |
10: end for |
3.2. The Proposed Momentum Reacceleration Gradient Descent Algorithm
Algorithm 2 MRGD |
do |
10: end for |
Algorithm 3 MRGDAdam |
do |
13: end for |
3.3. Experimental Methods
3.3.1. Datasets and Tasks
3.3.2. Model Architecture
3.3.3. Experimental Environment and Parameter Settings
4. Experimental Results and Analysis
4.1. Experimental Results
4.2. Experimental Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Kovachki, N.B.; Stuart, A.M. Continuous time analysis of momentum methods. J. Mach. Learn. Res. 2021, 22, 1–40. [Google Scholar]
- Sutskever, I.; Martens, J.; Dahl, G.; Hinton, G. On the importance of initialization and momentum in deep learning. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 17–19 June 2013; pp. 1139–1147. [Google Scholar]
- Zhuang, J.; Tang, T.; Ding, Y.; Tatikonda, S.C.; Dvornek, N.; Papademetris, X.; Duncan, J. Adabelief optimizer: Adapting stepsizes by the belief in observed gradients. Adv. Neural Inf. Process. Syst. 2020, 33, 18795–18806. [Google Scholar]
- Guo, Z.; Xu, Y.; Yin, W.; Jin, R.; Yang, T. A Novel Convergence Analysis for Algorithms of the Adam Family and Beyond. arXiv 2021, arXiv:2104.14840. [Google Scholar]
- Dozat, T. Incorporating Nesterov momentum into Adam. In Proceedings of the ICLR 2016 Workshop, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Shazeer, N.; Stern, M. Adafactor: Adaptive learning rates with sublinear memory cost. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; PMLR; pp. 4596–4604. [Google Scholar]
- Dubey, S.R.; Chakraborty, S.; Roy, S.K.; Mukherjee, S.; Singh, S.K.; Chaudhuri, B.B. diffGrad: An optimization method for convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 4500–4511. [Google Scholar] [CrossRef] [PubMed]
- Reddi, S.J.; Kale, S.; Kumar, S. On the convergence of Adam and beyond. In Proceedings of the 6th International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Zhang, C.; Shao, Y.; Sun, H.; Xing, L.; Zhao, Q.; Zhang, L. The WuC-Adam algorithm based on joint improvement of Warmup and cosine annealing algorithms. Math. Biosci. Eng. 2023, 21, 1270–1285. [Google Scholar] [CrossRef] [PubMed]
- Sun, H.; Zhou, W.; Shao, Y.; Cui, J.; Xing, L.; Zhao, Q.; Zhang, L. A Linear Interpolation and Curvature-Controlled Gradient Optimization Strategy Based on Adam. Algorithms 2024, 17, 185. [Google Scholar] [CrossRef]
- Li, M.; Luo, F.; Gu, C.; Luo, Y.; Ding, W. Adams algorithm based on adaptive momentum update strategy. J. Univ. Shanghai Sci. Technol. 2023, 45, 112–119. [Google Scholar] [CrossRef]
- Liu, L.; Jiang, H.; He, P.; Chen, W.; Liu, X.; Gao, J.; Han, J. On the variance of the adaptive learning rate and beyond. In Proceedings of the 8th International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Lucas, J.; Sun, S.; Zemel, R.; Grosse, R. Aggregated Momentum: Stability Through Passive Damping. In Proceedings of the 7th International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Heo, B.; Chun, S.; Oh, S.J.; Han, D.; Yun, S.; Kim, G.; Uh, Y.; Ha, J.W. Adamp: Slowing down the slowdown for momentum optimizers on scale-invariant weights. In Proceedings of the 9th International Conference on Learning Representations (ICLR), Virtual Event, Austria, 3–7 May 2021. [Google Scholar]
- Ma, J.; Yarats, D. Quasi-hyperbolic momentum and Adam for deep learning. In Proceedings of the 7th International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Jain, P.; Kakade, S.M.; Kidambi, R.; Netrapalli, P.; Sidford, A. Accelerating stochastic gradient descent for least squares regression. In Proceedings of the 31st Conference On Learning Theory, Stockholm, Sweden, 5–9 July 2018; pp. 545–604. [Google Scholar]
- Shi, N.; Li, D.; Hong, M.; Sun, R. RMSprop converges with proper hyper- parameter. In Proceedings of the 9th International Conference on Learning Representations (ICLR), Virtual Event, Austria, 3–7 May 2021. [Google Scholar]
- Luo, L.; Xiong, Y.; Liu, Y.; Sun, X. Adaptive gradient methods with dynamic bound of learning rate. In Proceedings of the 7th International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Zhang, M.; Lucas, J.; Ba, J.; Hinton, G.E. Lookahead optimizer: K steps forward, 1 step back. In Proceedings of the Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, Las Vegas, NV, USA, 27–30 June; 2016; pp. 770–778. [Google Scholar]
- Shao, Y.; Fan, S.; Sun, H.; Tan, Z.; Cai, Y.; Zhang, C.; Zhang, L. Multi-Scale Lightweight Neural Network for Steel Surface Defect Detection. Coatings 2023, 13, 1202. [Google Scholar] [CrossRef]
- Shao, Y.; Zhang, C.; Xing, L.; Sun, H.; Zhao, Q.; Zhang, L. A new dust detection method for photovoltaic panel surface based on Pytorch and its economic benefit analysis. Energy AI 2024, 16, 100349. [Google Scholar] [CrossRef]
- Li, L.; Jamieson, K.; DeSalvo, G.; Rostamizadeh, A.; Talwalkar, A. Hyperband: A novel bandit-based approach to hyperparameter optimization. J. Mach. Learn. Res. 2018, 18, 1–52. [Google Scholar]
Optimizer | Mnist | CIFAR-10 | CIFAR-100 | Aluminum Profile | ||||
---|---|---|---|---|---|---|---|---|
T./loss(%) | T./acc.(%) | T./loss(%) | T./acc.(%) | T./loss(%) | T./acc.(%) | T./loss(%) | T./acc.(%) | |
MRGD | 0.0215 | 0.9946 | 0.7336 | 0.858 | 2.009 | 0.5872 | 0.8023 | 0.7964 |
SGD | 0.0393 | 0.9876 | 1.036 | 0.6302 | 3.042 | 0.2572 | 1.414 | 0.5734 |
MRGDAdam | 0.0223 | 0.9943 | 0.7683 | 0.8495 | 2.01 | 0.5756 | 0.8011 | 0.8021 |
Adam | 0.0294 | 0.9921 | 0.6549 | 0.8571 | 2.54 | 0.5614 | 0.8322 | 0.7569 |
RMSprop | 0.0286 | 0.993 | 0.7676 | 0.8549 | 2.925 | 0.5311 | 0.8566 | 0.7294 |
Optimizer | Mnist | CIFAR-10 | CIFAR-100 | Aluminum Profile | ||||
---|---|---|---|---|---|---|---|---|
T./loss(%) | T./acc.(%) | T./loss(%) | T./acc.(%) | T./loss(%) | T./acc.(%) | T./loss(%) | T./acc.(%) | |
MRGD | 0.0255 | 0.9956 | 0.7102 | 0.849 | 2.163 | 0.4754 | 0.932 | 0.7115 |
SGD | 0.0968 | 0.9798 | 1.1901 | 0.642 | 3.664 | 0.1545 | 1.8173 | 0.4936 |
MRGDAdam | 0.0272 | 0.9901 | 0.6779 | 0.8618 | 2.412 | 0.5034 | 0.8217 | 0.7323 |
Adam | 0.0303 | 0.9850 | 0.6483 | 0.8360 | 2.351 | 0.3605 | 0.9674 | 0.6798 |
RMSprop | 0.0384 | 0.9856 | 0.6460 | 0.8261 | 2.733 | 0.3227 | 0.9827 | 0.6619 |
Optimizer | Mnist | CIFAR-10 | CIFAR-100 | Aluminum Profile | ||||
---|---|---|---|---|---|---|---|---|
T./loss(%) | T./acc.(%) | T./loss(%) | T./acc.(%) | T./loss(%) | T./acc.(%) | T./loss(%) | T./acc.(%) | |
MRGD | 0.0314 | 0.992 | 0.7616 | 0.7868 | 2.365 | 0.4398 | 1.103 | 0.6950 |
SGD | 0.1260 | 0.9613 | 1.5325 | 0.4235 | 4.077 | 0.0704 | 2.7967 | 0.4613 |
MRGDAdam | 0.0372 | 0.9913 | 0.8863 | 0.7604 | 2.571 | 0.4465 | 0.9435 | 0.7091 |
Adam | 0.0351 | 0.9904 | 1.0278 | 0.7544 | 3.068 | 0.3908 | 1.8641 | 0.5546 |
RMSprop | 0.0323 | 0.9944 | 0.9765 | 0.7683 | 3.024 | 0.365 | 1.986 | 0.5454 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, H.; Cai, Y.; Tao, R.; Shao, Y.; Xing, L.; Zhang, C.; Zhao, Q. An Improved Reacceleration Optimization Algorithm Based on the Momentum Method for Image Recognition. Mathematics 2024, 12, 1759. https://doi.org/10.3390/math12111759
Sun H, Cai Y, Tao R, Shao Y, Xing L, Zhang C, Zhao Q. An Improved Reacceleration Optimization Algorithm Based on the Momentum Method for Image Recognition. Mathematics. 2024; 12(11):1759. https://doi.org/10.3390/math12111759
Chicago/Turabian StyleSun, Haijing, Ying Cai, Ran Tao, Yichuan Shao, Lei Xing, Can Zhang, and Qian Zhao. 2024. "An Improved Reacceleration Optimization Algorithm Based on the Momentum Method for Image Recognition" Mathematics 12, no. 11: 1759. https://doi.org/10.3390/math12111759
APA StyleSun, H., Cai, Y., Tao, R., Shao, Y., Xing, L., Zhang, C., & Zhao, Q. (2024). An Improved Reacceleration Optimization Algorithm Based on the Momentum Method for Image Recognition. Mathematics, 12(11), 1759. https://doi.org/10.3390/math12111759