An Improved BGE-Adam Optimization Algorithm Based on Entropy Weighting and Adaptive Gradient Strategy
Abstract
:1. Introduction
2. Related Work
3. Design of the BGE-Adam Algorithm
3.1. Dynamically Adjusted -Parameter Mechanisms
3.2. Gradient Prediction Model
3.3. Entropy Weights
3.4. BGE-Adam Algorithm
Algorithm 1: BGE-Adam. |
Input: initial point , first moment decay, second moment decay, regularization constant Output:
|
4. Experimental Results and Analyses
4.1. Experiment Environment and Configuration
4.2. Experimental Results and Analysis
4.3. Experimental Setup
- (1)
- Optimization Algorithms: Comparative experiments were conducted between the BGE-Adam optimization algorithm and six existing optimization algorithms: SGD, Adam, Adadelta, Adamax, Adagrad, and NAdam.
- (2)
- Dataset Selection: Ten sets of comparative experiments were conducted on the seven optimization algorithms on the traditional datasets, MNIST, CIFAR10, and a medical image dataset for gastrointestinal disease diagnostics. The average of the results from the ten comparative experiments was taken as the final experimental outcome.
- (3)
- Network Model Selection: The comparative experiments in this study utilize the PyTorch deep learning architecture, selecting the lightweight neural network MobileNetV2 for comparison experiments. To evaluate the performance of optimizers more fairly, the BGE-Adam optimization algorithm and the six other optimizers in the comparison all employ the same network model architecture.
- (4)
- Hyperparameter Initialization: In addition to the inherent parameters within the optimizers, the same initial hyperparameters, including learning rate and weight decay, are set for each optimizer. The batch size used in the experiments is determined to be 128, and the number of training iterations is set to 100.
- (5)
- Model Training: Each optimizer is used to train the model for 50 epochs separately to ensure the repeatability of the experimental results.
4.4. Experimental Implementation
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Anjum, M.; Shahab, S. Improving Autonomous Vehicle Controls and Quality Using Natural Language Processing-Based Input Recognition Model. Sustainability 2023, 15, 5749. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Kashyap, R. A survey of deep learning optimizers–First and second order methods. arXiv 2023, arXiv:2211.15596. [Google Scholar]
- Zhang, Z.; Ma, L.; Li, Z.; Wu, C. Normalized Direction-preserving Adam. arXiv 2018, arXiv:1709.04546. [Google Scholar]
- Reyad, M.; Sarhan, A.; Arafa, M. A modified Adam algorithm for deep neural network optimization. Neural Comput. Appl. 2023, 35, 17095–17112. [Google Scholar] [CrossRef]
- Zhuang, J.; Tang, T.; Ding, Y.; Tatikonda, S.; Dvornek, N.; Papademetris, X.; Duncan, J.S. Adabelief optimizer: Adapting stepsizes by the belief in observed gradients. In Proceedings of the Advances in Neural Information Processing Systems, NeurIPS, Online, 6–12 December 2020. [Google Scholar]
- Duchi, J.; Hazan, E.; Singer, Y. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. J. Mach. Learn. Res. 2011, 12, 2121–2159. [Google Scholar]
- Yao, Z.; Gholami, A.; Shen, S.; Mustafa, M.; Keutzer, K.; Mahoney, M. ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning. Proc. Aaai Conf. Artif. Intell. 2021, 35, 10665–10673. [Google Scholar] [CrossRef]
- Luo, L.; Xiong, Y.; Liu, Y.; Sun, X. Adaptive Gradient Methods with Dynamic Bound of Learning Rate. arXiv 2019, arXiv:1902.09843. [Google Scholar]
- Gill, K.; Sharma, A.; Anand, V.; Gupta, R. Brain Tumor Detection using VGG19 model on Adadelta and SGD Optimizer. In Proceedings of the 2022 6th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 1–3 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1407–1412. [Google Scholar]
- Wang, J.; Cao, Z. Chinese text sentiment analysis using LSTM network based on L2 and Nadam. In Proceedings of the 2017 IEEE 17th International Conference on Communication Technology (ICCT), Chengdu, China, 27–30 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1891–1895. [Google Scholar]
- Zhang, Q.; Zhang, Y.; Shao, Y.; Liu, M.; Li, J.; Yuan, J.; Wang, R. Boosting Adversarial Attacks with Nadam Optimizer. Electronics 2023, 12, 1464. [Google Scholar] [CrossRef]
- Landro, N.; Gallo, I.; La Grassa, R. Mixing ADAM and SGD: A Combined Optimization Method. arXiv 2020, arXiv:2011.08042. [Google Scholar]
- Woodworth, B.; Patel, K.; Stich, S.; Dai, Z.; Bullins, B.; Mcmahan, B.; Shamir, O.; Srebro, N. Is Local SGD Better than Minibatch SGD? In Proceedings of the 37th International Conference on Machine Learning, Online, 13–18 July 2020; PMLR, 2020; pp. 10334–10343. [Google Scholar]
- Yi, D.; Ahn, J.; Ji, S. An Effective Optimization Method for Machine Learning Based on ADAM. Appl. Sci. 2020, 10, 1073. [Google Scholar] [CrossRef]
- Zhang, C.; Shao, Y.; Sun, H.; Xing, L.; Zhao, Q.; Zhang, L. The WuC-Adam algorithm based on joint improvement of Warmup and cosine annealing algorithms. Math. Biosci. Eng. 2024, 21, 1270–1285. [Google Scholar] [CrossRef] [PubMed]
- Chen, X.; Liu, S.; Sun, R.; Hong, M. On the Convergence of A Class of Adam-type Algorithms for Non-Convex Optimization. arXiv 2018, arXiv:1808.02941. [Google Scholar]
- Reddi, S.J.; Kale, S.; Kumar, S. On the Convergence of Adam and Beyond. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Jimenez Rezende, D.; Mohamed, S. Variational information maximisation for intrinsically motivated reinforcement learning. In Proceedings of the Advances in Neural Information Processing Systems, NeurIPS, Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
- Lhermitte, E.; Hilal, M.; Furlong, R.; O’Brien, V.; Humeau-Heurtier, A. Deep Learning and Entropy-Based Texture Features for Color Image Classification. Entropy 2022, 24, 1577. [Google Scholar] [CrossRef] [PubMed]
- Chinea Manrique de Lara, A. On the theory of deep learning: A theoretical physics perspective (Part I). Phys. A Stat. Mech. Its Appl. 2023, 632, 129308. [Google Scholar] [CrossRef]
- Shao, Y.; Zhang, C.; Xing, L.; Sun, H.; Zhao, Q.; Zhang, L. A new dust detection method for photovoltaic panel surface based on Pytorch and its economic benefit analysis. Energy AI 2024, 16, 100349. [Google Scholar] [CrossRef]
- Khanday, O.M.; Dadvandipour, S.; Lone, M.A. Effect of filter sizes on image classification in CNN: A case study on CFIR10 and Fashion-MNIST datasets. IAES Int. J. Artif. Intell. (IJ-AI) 2021, 10, 872. [Google Scholar] [CrossRef]
- Sutton, R.T.; Zaïane, O.R.; Goebel, R.; Baumgart, D.C. Artificial intelligence enabled automated diagnosis and grading of ulcerative colitis endoscopy images. Sci. Rep. 2022, 12, 2748. [Google Scholar] [CrossRef] [PubMed]
- Shao, Y.; Fan, S.; Sun, H.; Tan, Z.; Cai, Y.; Zhang, C.; Zhang, L. Multi-Scale Lightweight Neural Network for Steel Surface Defect Detection. Coatings 2023, 13, 1202. [Google Scholar] [CrossRef]
Hyperparameter | Description | Default Value |
---|---|---|
lr (Learning Rate) | Controls the update step size of the model at each iteration | 0.001 |
alpha | Determines the weight ratio of the predicted gradient to the actual gradient | 0.5 |
betas ( Parameters) | A pair of values used to compute the moving averages of the gradient and its square | (0.9, 0.999) |
eps () | A small number to prevent division by zero errors | 1 × 10−8 |
weight_decay | Weight decay, used for regularization and to prevent overfitting | 0 |
entropy_weight | Entropy weight, used to introduce randomness into the parameter space during optimization | 0.01 |
amsgrad | Boolean value indicating whether to use the AMSGrad variant to prevent sudden changes in gradient updates | False |
beta1_max | The maximum adjustment value of beta1 | 0.9 |
beta1_min | The minimum adjustment value of beta1 | 0.5 |
beta2_max | The maximum adjustment value of beta2 | 0.999 |
beta2_min | The minimum adjustment value of beta2 | 0.9 |
Software Resources | Version |
---|---|
Python | 3.11.3 |
torch | 2.0.0+cu11.8 |
torchvision | 0.15.1+cu11.8 |
torchaudio | 2.0.1+cu11.8 |
lightning | 2.0.4 |
wandb | W&B Local 0.47.2 |
Dataset | Number of Datasets | Training Set | Validation Set | Test Set | Classification |
---|---|---|---|---|---|
CIFAR10 | 60,000 | 45,000 | 5000 | 10,000 | 10 |
MNIST | 70,000 | 55,000 | 5000 | 10,000 | 10 |
Medical | 1885 | 1400 | 200 | 285 | 8 |
Dataset | Optimization Algorithm | Accuracy | Loss |
---|---|---|---|
Adam | 99.23% | 0.04474 | |
Adadelta | 83.70% | 0.4897 | |
Adamax | 98.84% | 0.0629 | |
MNIST | Adagrad | 89.09% | 0.4041 |
Nadam | 99.30% | 0.03188 | |
SGD | 97.11% | 0.1095 | |
BGE-Adam | 99.34% | 0.0756 | |
Adam | 70.11% | 1.195 | |
Adadelta | 27.37% | 1.951 | |
Adamax | 55.45% | 1.693 | |
CIFAR10 | Adagrad | 29.58% | 1.974 |
Nadam | 68.95% | 2.439 | |
SGD | 48.66% | 1.478 | |
BGE-Adam | 71.4% | 1.458 | |
Adam | 67.66% | 3.481 | |
Adadelta | 60.85% | 2.629 | |
Adamax | 66.81% | 2.657 | |
Medical | Adagrad | 67.23% | 2.401 |
Nadam | 67.66% | 2.363 | |
SGD | 68.09% | 3.681 | |
BGE-Adam | 69.36% | 2.852 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shao, Y.; Wang, J.; Sun, H.; Yu, H.; Xing, L.; Zhao, Q.; Zhang, L. An Improved BGE-Adam Optimization Algorithm Based on Entropy Weighting and Adaptive Gradient Strategy. Symmetry 2024, 16, 623. https://doi.org/10.3390/sym16050623
Shao Y, Wang J, Sun H, Yu H, Xing L, Zhao Q, Zhang L. An Improved BGE-Adam Optimization Algorithm Based on Entropy Weighting and Adaptive Gradient Strategy. Symmetry. 2024; 16(5):623. https://doi.org/10.3390/sym16050623
Chicago/Turabian StyleShao, Yichuan, Jiantao Wang, Haijing Sun, Hao Yu, Lei Xing, Qian Zhao, and Le Zhang. 2024. "An Improved BGE-Adam Optimization Algorithm Based on Entropy Weighting and Adaptive Gradient Strategy" Symmetry 16, no. 5: 623. https://doi.org/10.3390/sym16050623