Class-Center-Based Self-Knowledge Distillation: A Simple Method to Reduce Intra-Class Variance
Abstract
:1. Introduction
- A simple visualization method is designed to observe the change in features and their gradients during training. It can provide intuition regarding how feature learning is influenced by different training methods.
- Using this visualization method, it is identified that reducing the intra-class variation in features is the key to inter-sample self-distillation.
- Base on the above finding, a simple but effective self-distillation technique, in conjunction with class centers, is proposed. It further demonstrates that reducing intra-class variance with self-distillation helps model generalization.
- The inter-sample self-distillation and our CSD are experimentally demonstrated to contribute more significantly to fine-grained classification.
2. Related Work
3. Methods
3.1. Visualization of Update Tendencies for Penultimate Layer Features
3.2. How Inter-Sample Self-Distillation Affects Feature Learning
3.2.1. Training with Cross-Entropy
3.2.2. Training with Inter-Sample Self-Distillation
3.3. Center Self-Distillation
4. Experiments
4.1. Dataset
4.2. Network Architectures
4.3. Implementation Details
4.4. Evaluation Metrics
4.5. Quantitative Results
4.6. Ablation Study
Measurement | Method | CIFAR-100 | TinyImageNet | CUB-200-2011 | Stanford Dogs | MIT67 |
---|---|---|---|---|---|---|
Top-5 ↓ | Cross-entropy | 6.91±0.09 | 22.21±0.29 | 22.30±0.68 | 11.80±0.27 | 19.25±0.53 |
AdaCos | 9.99±0.20 | 22.24±0.11 | 15.24±0.66 | 11.02±0.22 | 19.05±2.33 | |
Virtual-softmax | 8.54±0.11 | 24.15±0.17 | 13.16±0.20 | 8.64±0.21 | 19.10±0.20 | |
Maximum-entropy | 7.29±0.12 | 21.53±0.50 | 19.80±1.21 | 10.90±0.31 | 20.47±0.90 | |
Label-smoothing | 7.18±0.08 | 20.74±0.31 | 22.40±0.85 | 13.41±0.40 | 19.53±0.75 | |
CS-KD | 5.69±0.03 | 19.21±0.04 | 13.07±0.26 | 8.55±0.07 | 17.46±0.38 | |
BAKE | 7.45±0.06 | 20.06±0.34 | 13.00±0.41 | 10.23±0.13 | 18.43±0.39 | |
CSD (ours) | 5.34±0.09 | 19.39±0.23 | 9.49±0.24 | 7.09±0.27 | 16.77±0.71 | |
R@1 ↑ | Cross-entropy | 61.38±0.64 | 30.59±0.42 | 33.92±1.70 | 47.51±1.02 | 31.42±1.00 |
AdaCos | 67.95±0.42 | 44.66±0.52 | 54.86±0.24 | 58.37±0.43 | 42.39±1.91 | |
Virtual-softmax | 68.35±0.48 | 44.69±0.58 | 55.56±0.74 | 59.71±0.56 | 44.20±0.90 | |
Maximum-entropy | 71.51±0.29 | 39.18±0.79 | 48.66±2.10 | 60.05±0.45 | 38.06±3.32 | |
Label-smoothing | 71.44±0.03 | 34.79±0.67 | 41.59±0.94 | 54.48±0.68 | 35.15±1.54 | |
CS-KD | 71.15±0.15 | 47.15±0.40 | 59.06±0.38 | 62.67±0.07 | 46.74±1.48 | |
BAKE | 71.24±0.66 | 45.23±0.34 | 62.55±0.91 | 64.72±0.43 | 46.12±0.45 | |
CSD (ours) | 72.22±0.21 | 48.05±0.17 | 65.85±0.32 | 63.97±0.62 | 50.65±1.10 | |
Mstd | Cross-entropy | 17.24±1.23 | 11.84±0.86 | 9.26±0.66 | 8.96±0.35 | 6.52±0.72 |
Label-smoothing | 5.54±0.45 | 8.49±0.41 | 5.68±0.38 | 4.93±0.26 | 3.95±0.65 | |
CS-KD | 5.91±0.34 | 5.92±0.64 | 4.18±0.29 | 3.64±0.66 | 2.91±0.31 | |
BAKE | 3.85±0.32 | 5.42±0.48 | 3.83±0.29 | 2.70±0.19 | 2.19±0.11 | |
CSD (ours) | 5.84±0.16 | 7.17±0.47 | 4.87±0.09 | 4.47±0.29 | 3.36±0.33 |
4.7. Calibration Effects
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Buciluf, C.; Caruana, R.; Niculescu-Mizil, A. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, 20–23 August 2006; pp. 535–541. [Google Scholar]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Papernot, N.; McDaniel, P.; Wu, X.; Jha, S.; Swami, A. Distillation as a defense to adversarial perturbations against deep neural networks. In Proceedings of the 2016 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2016; pp. 582–597. [Google Scholar]
- Yim, J.; Joo, D.; Bae, J.; Kim, J. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4133–4141. [Google Scholar]
- Yu, R.; Li, A.; Morariu, V.I.; Davis, L.S. Visual relationship detection with internal and external linguistic knowledge distillation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1974–1982. [Google Scholar]
- Hu, B.; Zhou, S.; Xiong, Z.; Wu, F. Cross-Resolution Distillation for Efficient 3D Medical Image Registration. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 7269–7283. [Google Scholar] [CrossRef]
- Liu, T.; Lam, K.M.; Zhao, R.; Qiu, G. Deep Cross-Modal Representation Learning and Distillation for Illumination-Invariant Pedestrian Detection. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 315–329. [Google Scholar] [CrossRef]
- Ahn, S.; Hu, S.X.; Damianou, A.; Lawrence, N.D.; Dai, Z. Variational information distillation for knowledge transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9163–9171. [Google Scholar]
- Furlanello, T.; Lipton, Z.; Tschannen, M.; Itti, L.; Anandkumar, A. Born again neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1607–1616. [Google Scholar]
- Yang, C.; Xie, L.; Qiao, S.; Yuille, A.L. Training deep neural networks in generations: A more tolerant teacher educates better students. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 5628–5635. [Google Scholar]
- Zhang, Z.; Sabuncu, M. Self-distillation as instance-specific label smoothing. Adv. Neural Inf. Process. Syst. 2020, 33, 2184–2195. [Google Scholar]
- Mobahi, H.; Farajtabar, M.; Bartlett, P. Self-distillation amplifies regularization in hilbert space. Adv. Neural Inf. Process. Syst. 2020, 33, 3351–3361. [Google Scholar]
- Abnar, S.; Dehghani, M.; Zuidema, W. Transferring inductive biases through knowledge distillation. arXiv 2020, arXiv:2006.00555. [Google Scholar]
- Zhou, C.; Neubig, G.; Gu, J. Understanding knowledge distillation in non-autoregressive machine translation. arXiv 2019, arXiv:1911.02727. [Google Scholar]
- Yun, S.; Park, J.; Lee, K.; Shin, J. Regularizing class-wise predictions via self-knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 13876–13885. [Google Scholar]
- Ge, Y.; Choi, C.L.; Zhang, X.; Zhao, P.; Zhu, F.; Zhao, R.; Li, H. Self-distillation with Batch Knowledge Ensembling Improves ImageNet Classification. arXiv 2021, arXiv:2104.13298. [Google Scholar]
- Park, W.; Kim, D.; Lu, Y.; Cho, M. Relational knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3967–3976. [Google Scholar]
- Müller, R.; Kornblith, S.; Hinton, G.E. When does label smoothing help? Adv. Neural Inf. Process. Syst. 2019, 32, 4694–4703. [Google Scholar]
- Wen, Y.; Zhang, K.; Li, Z.; Qiao, Y. A discriminative feature learning approach for deep face recognition. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 499–515. [Google Scholar]
- Wang, L.; Zhang, L.; Qi, X.; Yi, Z. Deep Attention-Based Imbalanced Image Classification. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 3320–3330. [Google Scholar] [CrossRef] [PubMed]
- Zhang, L.; Yi, Z.; Amari, S.I. Theoretical study of oscillator neurons in recurrent neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 5242–5248. [Google Scholar] [CrossRef] [PubMed]
- Qi, X.; Zhang, L.; Chen, Y.; Pi, Y.; Chen, Y.; Lv, Q.; Yi, Z. Automated diagnosis of breast ultrasonography images using deep neural networks. Med. Image Anal. 2019, 52, 185–198. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Shu, X.; Chen, C.; Teng, Y.; Zhang, L.; Xu, J. A semi-symmetric domain adaptation network based on multi-level adversarial features for meningioma segmentation. Knowl.-Based Syst. 2021, 228, 107245. [Google Scholar] [CrossRef]
- Ba, J.; Caruana, R. Do deep nets really need to be deep? Adv. Neural Inf. Process. Syst. 2014, 27, 2654–2662. [Google Scholar]
- Zhang, K.; Zhang, C.; Li, S.; Zeng, D.; Ge, S. Student Network Learning via Evolutionary Knowledge Distillation. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 2251–2263. [Google Scholar] [CrossRef]
- Cui, X.; Wang, C.; Ren, D.; Chen, Y.; Zhu, P. Semi-supervised Image Deraining Using Knowledge Distillation. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 8327–8341. [Google Scholar] [CrossRef]
- Adriana, R.; Nicolas, B.; Ebrahimi, K.S.; Antoine, C.; Carlo, G.; Yoshua, B. Fitnets: Hints for thin deep nets. Proc. ICLR 2015, 2. [Google Scholar] [CrossRef]
- Heo, B.; Kim, J.; Yun, S.; Park, H.; Kwak, N.; Choi, J.Y. A comprehensive overhaul of feature distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1921–1930. [Google Scholar]
- Srinivas, S.; Fleuret, F. Knowledge transfer with jacobian matching. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 4723–4731. [Google Scholar]
- Wang, Z.; Shu, X.; Wang, Y.; Feng, Y.; Zhang, L.; Yi, Z. A feature space-restricted attention attack on medical deep learning systems. IEEE Trans. Cybern. 2022, 53, 5323–5335. [Google Scholar] [CrossRef] [PubMed]
- Yuan, L.; Tay, F.E.; Li, G.; Wang, T.; Feng, J. Revisiting knowledge distillation via label smoothing regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3903–3911. [Google Scholar]
- Yang, C.; Xie, L.; Su, C.; Yuille, A.L. Snapshot distillation: Teacher-student optimization in one generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2859–2868. [Google Scholar]
- Kim, K.; Ji, B.; Yoon, D.; Hwang, S. Self-knowledge distillation with progressive refinement of targets. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 6567–6576. [Google Scholar]
- Gotmare, A.; Keskar, N.S.; Xiong, C.; Socher, R. A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Zhang, L.; Song, J.; Gao, A.; Chen, J.; Bao, C.; Ma, K. Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3713–3722. [Google Scholar]
- Xu, T.B.; Liu, C.L. Data-distortion guided self-distillation for deep neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27–1 February 2019; Volume 33, pp. 5565–5572. [Google Scholar]
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- Bagherinezhad, H.; Horton, M.; Rastegari, M.; Farhadi, A. Label refinery: Improving imagenet classification through label progression. arXiv 2018, arXiv:1805.02641. [Google Scholar]
- Beyer, L.; Hénaff, O.J.; Kolesnikov, A.; Zhai, X.; Oord, A.v.D. Are we done with imagenet? arXiv 2020, arXiv:2006.07159. [Google Scholar]
- Yun, S.; Oh, S.J.; Heo, B.; Han, D.; Choe, J.; Chun, S. Re-labeling imagenet: From single to multi-labels, from global to localized labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 2340–2350. [Google Scholar]
- Wah, C.; Branson, S.; Welinder, P.; Perona, P.; Belongie, S. The Caltech-Ucsd Birds-200-2011 Dataset; CNS-TR-2011-001; California Institute of Technology: Pasadena, CA, USA, 2011. [Google Scholar]
- Maji, S.; Rahtu, E.; Kannala, J.; Blaschko, M.; Vedaldi, A. Fine-grained visual classification of aircraft. arXiv 2013, arXiv:1306.5151. [Google Scholar]
- Krause, J.; Stark, M.; Deng, J.; Li, F.-F. 3D object representations for fine-grained categorization. In Proceedings of the IEEE International Conference on Computer VISION Workshops, Sydney, Australia, 1–8 December 2013; pp. 554–561. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Li, F.-F. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Martin, C.H.; Mahoney, M.W. Implicit self-regularization in deep neural networks: Evidence from random matrix theory and implications for learning. J. Mach. Learn. Res. 2021, 22, 1–73. [Google Scholar]
- Krizhevsky, A.; Hinton, G. Learning multiple layers of features from tiny images. In Handbook of Systemic Autoimmune Diseases; Elsevier: Amsterdam, The Netherlands, 2009; Volume 1, p. 4. [Google Scholar]
- Quattoni, A.; Torralba, A. Recognizing indoor scenes. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 413–420. [Google Scholar]
- Khosla, A.; Jayadevaprakash, N.; Yao, B.; Li, F.F. Novel dataset for fine-grained image categorization: Stanford dogs. In Proceedings of the CVPR Workshop on Fine-Grained Visual Categorization (FGVC), Citeseer, 2011; Volume 2. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 630–645. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8026–8037. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25. Available online: https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf (accessed on 19 July 2024). [CrossRef]
- Kumar, A.; Singh, S.S.; Singh, K.; Biswas, B. Link prediction techniques, applications, and performance: A survey. Phys. A Stat. Mech. Its Appl. 2020, 553, 124289. [Google Scholar] [CrossRef]
- Guo, C.; Pleiss, G.; Sun, Y.; Weinberger, K.Q. On calibration of modern neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 1321–1330. [Google Scholar]
- Naeini, M.P.; Cooper, G.; Hauskrecht, M. Obtaining well calibrated probabilities using bayesian binning. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015. [Google Scholar]
- Lee, D.K.; In, J.; Lee, S. Standard deviation and standard error of the mean. Korean J. Anesthesiol. 2015, 68, 220–223. [Google Scholar] [CrossRef] [PubMed]
- Zhang, X.; Zhao, R.; Qiao, Y.; Wang, X.; Li, H. Adacos: Adaptively scaling cosine logits for effectively learning deep face representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 10823–10832. [Google Scholar]
- Chen, B.; Deng, W.; Shen, H. Virtual class enhanced discriminative embedding learning. Adv. Neural Inf. Process. Syst. 2018, 31. Available online: https://proceedings.neurips.cc/paper_files/paper/2018/file/d79aac075930c83c2f1e369a511148fe-Paper.pdf (accessed on 19 July 2024).
- Dubey, A.; Gupta, O.; Raskar, R.; Naik, N. Maximum-entropy fine grained classification. Adv. Neural Inf. Process. Syst. 2018, 31. Available online: https://proceedings.neurips.cc/paper_files/paper/2018/file/0c74b7f78409a4022a2c4c5a5ca3ee19-Paper.pdf (accessed on 19 July 2024).
- Zhou, W.; Li, H.; Tian, Q. Recent advance in content-based image retrieval: A literature survey. arXiv 2017, arXiv:1706.06064. [Google Scholar]
- Niculescu-Mizil, A.; Caruana, R. Predicting good probabilities with supervised learning. In Proceedings of the 22nd international Conference on Machine Learning, Bonn, Germany, 7–11 August 2005; pp. 625–632. [Google Scholar]
- Chang, D.; Pang, K.; Zheng, Y.; Ma, Z.; Song, Y.Z.; Guo, J. Your “Flamingo” is My “Bird”: Fine-Grained, or Not. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 11476–11485. [Google Scholar]
Dataset | Classes | Training | Test |
---|---|---|---|
CIFAR-100 | 100 | 50,000 | 10,000 |
TinyImageNet | 200 | 100,000 | 10,000 |
CUB-200-2011 | 200 | 5997 | 5794 |
Stanford Dogs | 120 | 12,000 | 8580 |
MIT67 | 67 | 5360 | 1340 |
Model | Method | CIFAR-100 | TinyImageNet | CUB-200-2011 | Stanford Dogs | MIT67 |
---|---|---|---|---|---|---|
ResNet-18 | Cross-entropy | 24.71±0.24 | 43.53±0.19 | 46.00±1.43 | 36.29±0.32 | 44.75±0.80 |
AdaCos | 23.71±0.36 | 42.61±0.20 | 35.47±0.07 | 32.66±0.34 | 42.66±0.43 | |
Virtual-softmax | 23.01±0.42 | 42.41±0.20 | 35.03±0.51 | 31.48±0.16 | 42.86±0.71 | |
Maximum-entropy | 22.72±0.29 | 41.77±0.13 | 39.86±1.11 | 32.41±0.20 | 43.36±1.62 | |
DDGSD | 23.85±1.57 | 41.48±0.12 | 41.17±1.28 | 31.53±0.54 | 41.17±2.62 | |
BYOT | 23.81±0.11 | 44.88±0.46 | 40.76±0.39 | 34.02±0.14 | 44.88±0.46 | |
Label-smoothing | 22.69±0.28 | 43.09±0.34 | 42.99±0.99 | 35.30±0.66 | 44.40±0.71 | |
Center-loss | 22.89±0.46 | 42.43±0.14 | 37.21±0.13 | 36.07±0.28 | 44.41±0.56 | |
CS-KD | 21.99±0.13 | 41.62±0.38 | 33.28±0.99 | 30.85±0.28 | 40.45±0.45 | |
Bake | 21.28±0.15 | 41.71±0.21 | 29.74±0.70 | 30.20±0.11 | 39.95±0.11 | |
CSD (ours) | 21.19±0.05 | 40.98±0.23 | 28.03±0.08 | 28.84±0.58 | 37.96±0.52 | |
DeseNet-121 | Cross-entropy | 22.23±0.04 | 39.22±0.27 | 42.30±0.44 | 33.39±0.17 | 41.79±0.19 |
AdaCos | 22.17±0.24 | 38.76±0.23 | 30.84±0.38 | 27.87±0.65 | 40.25±0.68 | |
Virtual-softmax | 23.66±0.10 | 41.58±1.58 | 33.85±0.75 | 30.55±0.72 | 43.66±0.30 | |
Maximum-entropy | 22.87±0.45 | 38.39±0.33 | 37.51±0.71 | 29.52±0.74 | 43.48±1.30 | |
Label-smoothing | 21.88±0.45 | 38.75±0.18 | 40.63±0.24 | 31.39±0.46 | 42.24±1.23 | |
CS-KD | 21.69±0.49 | 37.96±0.09 | 30.83±0.39 | 27.81±0.13 | 40.02±0.91 | |
Bake | 20.74±0.19 | 37.07±0.24 | 28.79±1.30 | 27.66±0.05 | 39.15±0.37 | |
CSD (ours) | 20.27±0.39 | 37.05±0.29 | 27.23±0.40 | 26.57±0.52 | 36.08±0.46 |
Image Size | Method | Training Time Cost (s/epoch) | Memory Consumption (GB) |
---|---|---|---|
224 × 224 | backbone | 128.74 | 20.95 |
CSD (ours) | 132.53 | 21.84 | |
32 × 32 | backbone | 71.53 | 11.56 |
CSD (ours) | 72.87 | 12.01 |
0 | 0.1 | 0.3 | 0.5 | 0.7 | 0.9 | |
---|---|---|---|---|---|---|
Top-1 | 24.71 ± 0.24 | 23.92 ± 0.19 | 23.19 ± 0.11 | 22.52 ± 0.07 | 21.98 ± 0.09 | 21.19 ± 0.05 |
Mstd | 5.53 ± 0.42 | 5.59 ± 0.30 | 5.65 ± 0.23 | 5.71 ± 0.19 | 5.79 ± 0.19 | 5.84 ± 0.16 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhong, K.; Zhang, L.; Wang, L.; Shu, X.; Wang, Z. Class-Center-Based Self-Knowledge Distillation: A Simple Method to Reduce Intra-Class Variance. Appl. Sci. 2024, 14, 7022. https://doi.org/10.3390/app14167022
Zhong K, Zhang L, Wang L, Shu X, Wang Z. Class-Center-Based Self-Knowledge Distillation: A Simple Method to Reduce Intra-Class Variance. Applied Sciences. 2024; 14(16):7022. https://doi.org/10.3390/app14167022
Chicago/Turabian StyleZhong, Ke, Lei Zhang, Lituan Wang, Xin Shu, and Zizhou Wang. 2024. "Class-Center-Based Self-Knowledge Distillation: A Simple Method to Reduce Intra-Class Variance" Applied Sciences 14, no. 16: 7022. https://doi.org/10.3390/app14167022
APA StyleZhong, K., Zhang, L., Wang, L., Shu, X., & Wang, Z. (2024). Class-Center-Based Self-Knowledge Distillation: A Simple Method to Reduce Intra-Class Variance. Applied Sciences, 14(16), 7022. https://doi.org/10.3390/app14167022