Training Acceleration Method Based on Parameter Freezing
Abstract
:1. Introduction
- We design a training strategy based on freezing the parameters of models according to the convergence trend during the training of deep neural networks;
- We implement a linear freezing algorithm, which can help save at least 19.4% of training time;
- We present an adaptive freezing algorithm according to the information provided by the gradient, achieving a speedup ratio of up to 1.38×.
2. Related Works
2.1. Remote Sensing Object Detection
2.2. Deep Neural Network Training Acceleration
2.2.1. Compression of Parameters
2.2.2. Compression of Model Structures
2.3. Similarity Measure between Deep Neural Network Representations
3. Pre-Experiment: Observation of Training Process
4. Parameter Freezing Algorithm
4.1. Linear Freezing Algorithm
4.2. Adaptive Freezing Algorithm
Algorithm 1 Adaptive Freezing Algorithm (AFA) |
Input: number of layers , number of layers , time , the Frobenius norm of the gradients the upper limit of the freezing layers Output: number of layers 1: T ← 0; 2: while one epoch is finished do 3: T = T + 1; 4: for layer index = to do 5: ; 6: if do 7: = ; 8: for layer index = to do 9: freeze the layers; |
5. Experiments
5.1. Setup
5.1.1. Experimental Environment
5.1.2. Model
5.1.3. Datasets
5.1.4. Evaluation Metrics
5.2. Results
6. Limitations and Further Work
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Sun, X.; Wang, P.; Yan, Z.; Xu, F.; Wang, R.; Diao, W.; Chen, J.; Li, J.; Feng, Y.; Xu, T.; et al. FAIR1M: A benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery. ISPRS J. Photogramm. Remote Sens. 2022, 184, 116–130. [Google Scholar] [CrossRef]
- Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 1137–1149. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Christian, S.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Part I 14. pp. 21–37. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Dehghani, M.; Djolonga, J.; Mustafa, B.; Padlewski, P.; Heek, J.; Gilmer, J.; Steiner, A.; Caron, M.; Geirhos, R.; Alabdulmohsin, I.; et al. Scaling vision transformers to 22 billion parameters. arXiv 2023, arXiv:2302.05442. [Google Scholar]
- Zou, Z.; Shi, Z. Random access memories: A new paradigm for target detection in high resolution aerial remote sensing images. IEEE Trans. Image Process. 2017, 27, 1100–1111. [Google Scholar] [CrossRef]
- Xia, G.S.; Bai, X.; Ding, J.; Zhen, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3974–3983. [Google Scholar]
- Li, K.; Wan, G.; Cheng, G.; Meng, L.; Han, J. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J. Photogramm. Remote Sens. 2020, 159, 296–307. [Google Scholar] [CrossRef]
- Cheng, G.; Zhou, P.; Han, J. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7405–7415. [Google Scholar] [CrossRef]
- Cheng, G.; Han, J.; Zhou, P.; Xu, D. Learning rotation-invariant and fisher discriminative convolutional neural networks for object detection. IEEE Trans. Image Process. 2018, 28, 265–278. [Google Scholar] [CrossRef]
- Long, Y.; Gong, Y.; Xiao, Z.; Liu, Q. Accurate object localization in remote sensing images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2486–2498. [Google Scholar] [CrossRef]
- Li, K.; Cheng, G.; Bu, S.; You, X. Rotation-insensitive and context-augmented object detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2017, 56, 2337–2348. [Google Scholar] [CrossRef]
- Xu, Z.; Xu, X.; Wang, L.; Yang, R.; Pu, F. Deformable convnet with aspect ratio constrained nms for object detection in remote sensing imagery. Remote Sens. 2017, 9, 1312. [Google Scholar] [CrossRef]
- Zhong, Y.; Han, X.; Zhang, L. Multi-class geospatial object detection based on a position-sensitive balancing framework for high spatial resolution remote sensing imagery. ISPRS J. Photogramm. Remote Sens. 2018, 138, 281–294. [Google Scholar] [CrossRef]
- Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra r-cnn: Towards balanced learning for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 821–830. [Google Scholar]
- Qin, R.; Liu, Q.; Gao, G.; Huang, D.; Wang, Y. MRDet: A multihead network for accurate rotated object detection in aerial images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–12. [Google Scholar] [CrossRef]
- Liu, L.; Pan, Z.; Lei, B. Learning a rotation invariant detector with rotatable bounding box. arXiv 2017, arXiv:1711.09405. [Google Scholar]
- Tang, T.; Zhou, S.; Deng, Z.; Lei, L.; Zou, H. Arbitrary-oriented vehicle detection in aerial imagery with single convolutional neural networks. Remote Sens. 2017, 9, 1170. [Google Scholar] [CrossRef]
- Liu, W.; Ma, L.; Chen, H. Arbitrary-oriented ship detection framework in optical remote-sensing images. IEEE Geosci. Remote Sens. Lett. 2018, 15, 937–941. [Google Scholar] [CrossRef]
- Zhong, J.; Lei, T.; Yao, G. Robust vehicle detection in aerial images based on cascaded convolutional neural networks. Sensors 2017, 17, 2720. [Google Scholar] [CrossRef] [PubMed]
- Xu, T.; Sun, X.; Diao, W.; Zhao, L.; Fu, K.; Wang, H. ASSD: Feature aligned single-shot detection for multiscale objects in aerial imagery. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5607117. [Google Scholar] [CrossRef]
- LeCun, Y.; Denker, J.; Solla, S. Optimal brain damage. Adv. Neural Inf. Process. Syst. 1989, 2, 598–605. [Google Scholar]
- Wen, W.; Wu, C.; Wang, Y.; Chen, Y.; Li, H. Learning structured sparsity in deep neural networks. Adv. Neural Inf. Process. Syst. 2016, 29, 2082–2090. [Google Scholar]
- Han, S.; Pool, J.; Tran, J.; Dally, W.J. Learning both weights and connections for efficient neural network. Adv. Neural Inf. Process. Syst. 2015, 28, 1135–1143. [Google Scholar]
- Courbariaux, M.; Bengio, Y.; David, J.P. Binaryconnect: Training deep neural networks with binary weights during propagations. Adv. Neural Inf. Process. Syst. 2015, 28, 3123–3131. [Google Scholar]
- Li, F.; Liu, B.; Wang, X.; Zhang, B.; Yan, J. Ternary weight networks. arXiv 2016, arXiv:1605.04711. [Google Scholar]
- Liu, B.; Wang, M.; Foroosh, H.; Tappen, M.; Penksy, M. Sparse convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 806–814. [Google Scholar]
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50× fewer parameters and< 0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, CA, USA, 18–22 June 2018; pp. 6848–6856. [Google Scholar]
- Buciluǎ, C.; Caruana, R.; Niculescu-Mizil, A. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, 20–23 August 2006; pp. 535–541. [Google Scholar]
- Adriana, R.; Nicolas, B.; Ebrahimi, K.S.; Chassang, A.; Gatta, C.; Bengio, Y. Fitnets: Hints for thin deep nets. Proc. ICLR 2015, 2, 1. [Google Scholar]
- Yim, J.; Joo, D.; Bae, J.; Kim, J. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4133–4141. [Google Scholar]
- Raghu, M.; Gilmer, J.; Yosinski, J.; Sohl-Dickstein, J. Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability. Adv. Neural Inf. Process. Syst. 2017, 30, 6078–6087. [Google Scholar]
- Kornblith, S.; Norouzi, M.; Lee, H.; Hinton, G. Similarity of neural network representations revisited. In Proceedings of the International Conference on Machine Learning (PMLR), Long Beach, CA, USA, 9–15 June 2019; pp. 3519–3529. [Google Scholar]
- Li, Y.; Mao, H.; Girshick, R.; He, K. Exploring plain vision transformer backbones for object detection. In Proceedings of the European Conference on Computer Vision, Tel-Aviv, Israel, 23–27 October 2022; pp. 280–296. [Google Scholar]
Dataset | Categories | Images | Instances | Image Width |
---|---|---|---|---|
DIOR | 20 | 23,463 | 192,472 | 800 |
SIMD | 15 | 5000 | 45,096 | 1024 |
RSOD | 4 | 976 | 6950 | ~1000 |
Freezing Algorithm | Time Cost at Epoch 1 (s) | Time Cost at Epoch 31 (s) | Time Cost at Epoch 61 (s) | Average Time per Epoch (s) | Total Training Time (s) | AR@10 | mAP | Speedup |
---|---|---|---|---|---|---|---|---|
Without | 6904.83 | 6959.95 | 6642.45 | 6895.93 | 448,235.56 | 0.50 | 78.8 | 1× |
LFA | 6875.81 | 5527.05 | 4075.56 | 5554.59 | 361,048.18 | 0.49 | 76.7 | 1.24× |
AFA | 6891.45 | 4763.51 | 4074.37 | 5010.23 | 325,664.90 | 0.48 | 75.9 | 1.38× |
Freezing Algorithm | Time Cost at Epoch 1 (s) | Time Cost at Epoch 31 (s) | Time Cost at Epoch 61 (s) | Average Time per Epoch (s) | Total Training Time (s) | AR@10 | mAP | Speedup |
---|---|---|---|---|---|---|---|---|
Without | 2369.53 | 2405.23 | 2313.78 | 2381.71 | 154,811.11 | 0.60 | 87.1 | 1× |
LFA | 2370.68 | 1898.20 | 1380.30 | 1909.06 | 124,088.89 | 0.61 | 88.0 | 1.25× |
AFA | 2369.44 | 1637.21 | 1381.68 | 1712.61 | 111,319.42 | 0.60 | 87.8 | 1.39× |
Freezing Algorithm | Time Cost at Epoch 1 (s) | Time Cost at Epoch 31 (s) | Time Cost at Epoch 61 (s) | Average Time per Epoch (s) | Total Training Time (s) | AR@10 | mAP | Speedup |
---|---|---|---|---|---|---|---|---|
Without | 443.98 | 451.22 | 435.28 | 444.29 | 28,878.98 | 0.53 | 90.5 | 1× |
LFA | 446.96 | 360.38 | 252.15 | 357.72 | 23,251.98 | 0.54 | 92.1 | 1.24× |
AFA | 447.04 | 302.75 | 250.98 | 316.87 | 20,596.85 | 0.54 | 91.9 | 1.40× |
Method | mAP |
---|---|
Eff-Det | 66.1 |
RSADet | 72.2 |
R2IPoints | 74.6 |
SFSANet | 76.6 |
ViTDet | 78.8 |
ViTDet with LFA (ours) | 76.7 |
ViTDet with AFA (ours) | 75.9 |
Method | mAP |
---|---|
Faster R-CNN | 70.8 |
YOLOX-s | 77.4 |
YOLOv7-tiny | 82.2 |
MAY | 78.2 |
ViTDet | 87.1 |
ViTDet with LFA (ours) | 88.0 |
ViTDet with AFA (ours) | 87.8 |
Method | mAP |
---|---|
CFA-Net | 72.8 |
RoI-Trans | 81.8 |
YOLOv7 | 84.4 |
URSNet | 87.2 |
ViTDet | 90.5 |
ViTDet with LFA (ours) | 92.1 |
ViTDet with AFA (ours) | 91.9 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tang, H.; Chen, J.; Zhang, W.; Guo, Z. Training Acceleration Method Based on Parameter Freezing. Electronics 2024, 13, 2140. https://doi.org/10.3390/electronics13112140
Tang H, Chen J, Zhang W, Guo Z. Training Acceleration Method Based on Parameter Freezing. Electronics. 2024; 13(11):2140. https://doi.org/10.3390/electronics13112140
Chicago/Turabian StyleTang, Hongwei, Jialiang Chen, Wenkai Zhang, and Zhi Guo. 2024. "Training Acceleration Method Based on Parameter Freezing" Electronics 13, no. 11: 2140. https://doi.org/10.3390/electronics13112140
APA StyleTang, H., Chen, J., Zhang, W., & Guo, Z. (2024). Training Acceleration Method Based on Parameter Freezing. Electronics, 13(11), 2140. https://doi.org/10.3390/electronics13112140