A Long-Tailed Image Classification Method Based on Enhanced Contrastive Visual Language
Abstract
:1. Introduction
2. Related Work
2.1. Data Re-Sampling
2.2. Data Re-Weighting
2.3. Data Augmentation
2.4. Transfer Learning
2.5. Ensemble Learning
3. Method
3.1. The Overall Framework
3.2. Contrasting Visual-Language Pre-Training Model
3.3. Balanced Linear Adapters
3.4. Algorithm Description
Algorithm 1: Phrase I |
input: Iinput = {images, labels}, Tinput = {texts, labels} |
output: modelweight 1: for epoch = 1 to max_epoch do 2: T = Encode(labels, text) 3: I = Encode(labels, images) 4: train(model, I) 5: Eval(model, images, labels) 6: Logits(I, T) 7: pthepoch = {weight} 8: end for |
Algorithm 2: Phrase II |
input: Iinput = {images, labels}, Tinput = {texts, labels}, modelstage1 |
output: weight 1: model = load(best_model) 2: for epoch = 1 to max_epoch do 3: if epoch >= 2 then 4: I = Rebalance(Momentum) 5: end if 6: Momentum = model(I, labels, epoch) 7: train(model, I) 8: eval(model, images, labels) 9: Logit(model, I, T) 10: pthepoch = {weight} 11: end for |
4. Experiments
4.1. Long-Tailed Image Datasets
4.1.1. CIFAR100-LT
4.1.2. Places-LT
4.1.3. ImageNet-LT
4.2. Experimental Design and Validation
4.2.1. Experimental Results and Analysis of CIFAR100-LT
4.2.2. Experimental Results and Analysis of ImageNet-LT
4.2.3. Experimental Results and Analysis of Places-LT
4.3. Experimental Design and Validation
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Tas, S.; Sari, O.; Dalveren, Y.; Pazar, S.; Kara, A.; Derawi, M. Deep learning-based vehicle classification for low quality images. Sensors 2022, 22, 4740. [Google Scholar] [CrossRef] [PubMed]
- Berwo, M.A.; Khan, A.; Fang, Y.; Fahim, H.; Javaid, S.; Mahmood, J.; Abideen, Z.U.; M.S., S. Deep Learning Techniques for Vehicle Detection and Classification from Images/Videos: A Survey. Sensors 2023, 23, 4832. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Shen, H.; Xiong, W.; Zhang, X.; Hou, J. Method for Diagnosing Bearing Faults in Electromechanical Equipment Based on Improved Prototypical Networks. Sensors 2023, 23, 4485. [Google Scholar] [CrossRef] [PubMed]
- Kang, B.; Xie, S.; Rohrbach, M.; Yan, Z.; Gordo, A.; Feng, J.; Kalantidis, Y. Decoupling representation and classifier for long-tailed recognition. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 30 April 2020. [Google Scholar]
- Wang, T.; Li, Y.; Kang, B.; Li, J.; Liew, J.; Tang, S.; Hoi, S.; Feng, J. The devil is in classification: A simple framework for long-tail instance segmentation. In Proceedings of the 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 728–744. [Google Scholar]
- Park, M.; Song, H.J.; Kang, D.O. Imbalanced Classification via Feature Dictionary-Based Minority Oversampling. IEEE Access 2022, 10, 34236–34245. [Google Scholar] [CrossRef]
- Li, T.; Wang, Y.; Liu, L.; Chen, L.; Chen, C.L.P. Subspace-based minority oversampling for imbalance classification. Inf. Sci. 2023, 621, 371–388. [Google Scholar] [CrossRef]
- Lee, Y.S.; Bang, C.C. Framework for the Classification of Imbalanced Structured Data Using Under-Sampling and Convolutional Neural Network. Inf. Syst. Front. 2021, 24, 1795–1809. [Google Scholar] [CrossRef]
- Lehmann, D.; Ebner, M. Subclass-Based Undersampling for Class-Imbalanced Image Classification. In Proceedings of the 17th International Conference on Computer Vision Theory and Applications, Online, 6–8 February 2022; pp. 493–500. [Google Scholar]
- Farshidvard, A.; Hooshmand, F.; MirHassani, S.A. A novel two-phase clustering-based under-sampling method for imbalanced classification problems. Expert Syst. Appl. 2023, 213, 119003. [Google Scholar] [CrossRef]
- Ding, H.; Wei, B.; Gu, Z.; Zheng, H.; Zheng, B. KA-Ensemble: Towards imbalanced image classification ensembling under-sampling and over-sampling. Multimed. Tools Appl. 2020, 79, 14871–14888. [Google Scholar] [CrossRef]
- Swana, E.F.; Doorsamy, W.; Bokoro, P. Tomek link and SMOTE approaches for machine fault classification with an imbalanced dataset. Sensors 2022, 22, 3246. [Google Scholar] [CrossRef] [PubMed]
- Gupta, A.; Dollar, P.; Girshick, R. Lvis: A dataset for large vocabulary instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5356–5364. [Google Scholar]
- Peng, J.; Bu, X.; Sun, M.; Zhang, Z.; Tan, T.; Yan, J. Large-scale object detection in the wild from imbalanced multi-labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9709–9718. [Google Scholar]
- Hu, X.; Jiang, Y.; Tang, K.; Chen, J.; Miao, C.; Zhang, H. Learning to segment the tail. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 14045–14054. [Google Scholar]
- Wu, J.; Song, L.; Wang, T.; Zhang, Q.; Yuan, J. Forest r-cnn: Large-vocabulary long-tailed object detection and instance segmentation. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 1570–1578. [Google Scholar]
- Zhou, B.; Cui, Q.; Wei, X.S.; Chen, Z.-M. BBN: Bilateral-branch network with cumulative learning for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9719–9728. [Google Scholar]
- Zang, Y.; Huang, C.; Loy, C.C. FASA: Feature augmentation and sampling adaptation for long-tailed instance segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 3457–3466. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Hermans, A.; Beyer, L.; Leibe, B. In Defense of the Triplet Loss for Person Re-Identification. arXiv 2017, arXiv:1703.07737. [Google Scholar]
- Cui, Y.; Jia, M.; Lin, T.Y.; Song, Y.; Belongie, S. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9268–9277. [Google Scholar]
- Cao, K.; Wei, C.; Gaidon, A.; Arechiga, N.; Ma, T. Learning imbalanced datasets with label-distribution-aware margin loss. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; p. 32. [Google Scholar]
- Wu, T.; Huang, Q.; Liu, Z.; Wang, Y.; Lin, D. Distribution-balanced loss for multi-label classification in long-tailed datasets. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 162–178. [Google Scholar]
- Tan, J.; Wang, C.; Li, B.; Li, Q.; Ouyang, W.; Yin, C.; Yan, J. Equalization loss for long-tailed object recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11662–11671. [Google Scholar]
- Tan, J.; Lu, X.; Zhang, G.; Yin, C.; Li, Q. Equalization loss v2: A new gradient balance approach for long-tailed object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1685–1694. [Google Scholar]
- Wang, J.; Zhang, W.; Zang, Y.; Cao, Y.; Pang, J.; Gong, T.; Chen, K.; Liu, Z.; Loy, C.C.; Lin, D. Seesaw loss for long-tailed instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 9695–9704. [Google Scholar]
- Hong, Y.; Han, S.; Choi, K.; Seo, S.; Kim, B.; Chang, B. Disentangling label distribution for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6626–6636. [Google Scholar]
- Ren, J.; Yu, C.; Ma, X.; Ma, X.; Zhao, H.; Yi, S.; Li, H. Balanced meta-softmax for long-tailed visual recognition. Adv. Neural Inf. Process. Syst. 2020, 33, 4175–4186. [Google Scholar]
- Deng, Z.; Liu, H.; Wang, Y.; Wang, C.; Yu, Z.; Sun, X. PML: Progressive margin loss for long-tailed age classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10503–10512. [Google Scholar]
- Wu, T.; Liu, Z.; Huang, Q.; Wang, Y.; Lin, D. Adversarial robustness under long-tailed distribution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8659–8668. [Google Scholar]
- Xiao, L.; Xu, J.; Zhao, D.; Shang, E.; Zhu, Q.; Dai, B. Adversarial and Random Transformations for Robust Domain Adaptation and Generalization. Sensors 2023, 23, 5273. [Google Scholar] [CrossRef] [PubMed]
- Park, S.; Kim, J.; Jeong, H.-Y.; Kim, T.-K.; Yoo, J. C2RL: Convolutional-Contrastive Learning for Reinforcement Learning Based on Self-Pretraining for Strong Augmentation. Sensors 2023, 23, 4946. [Google Scholar] [CrossRef] [PubMed]
- Zhong, Z.; Cui, J.; Liu, S.; Jia, J. Improving calibration for long-tailed recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 16489–16498. [Google Scholar]
- Li, S.; Gong, K.; Liu, C.H.; Wang, Y.; Qiao, F.; Cheng, X. Metasaug: Meta semantic augmentation for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 5212–5221. [Google Scholar]
- Wang, Y.; Pan, X.; Song, S.; Zhang, H.; Wu, C.; Huang, G. Implicit semantic data augmentation for deep networks. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019; p. 32. [Google Scholar]
- Yin, X.; Yu, X.; Sohn, K.; Liu, X.; Chandraker, M. Feature transfer learning for face recognition with under-represented data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5704–5713. [Google Scholar]
- Liu, J.; Sun, Y.; Han, C.; Dou, Z.; Li, W. Deep representation learning on long-tailed data: A learnable embedding augmentation perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2970–2979. [Google Scholar]
- Chu, P.; Bian, X.; Liu, S.; Ling, H. Feature space augmentation for long-tailed data. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 694–710. [Google Scholar]
- Cui, Y.; Song, Y.; Sun, C.; Howard, A.; Belongie, S. Large scale fine-grained categorization and domain-specific transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4109–4118. [Google Scholar]
- Yang, Y.; Xu, Z. Rethinking the value of labels for improving class-imbalanced learning. Adv. Neural Inf. Process. Syst. 2020, 33, 19290–19301. [Google Scholar]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9729–9738. [Google Scholar]
- Li, T.; Wang, L.; Wu, G. Self-supervision to distillation for long-tailed visual recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 630–639. [Google Scholar]
- Wei, H.; Tao, L.; Xie, R.; Feng, L.; An, B. Open-Sampling: Exploring Out-of-Distribution data for Re-balancing Long-tailed datasets. In Proceedings of the International Conference on Machine Learning (PMLR), Baltimore, MA, USA, 17–23 July 2022; pp. 23615–23630. [Google Scholar]
- Changpinyo, S.; Sharma, P.; Ding, N.; Soricut, R. Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 3558–3568. [Google Scholar]
- Xiang, L.; Ding, G.; Han, J. Learning from multiple experts: Self-paced knowledge distillation for long-tailed classification. In Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Part V 16; Springer International Publishing: Cham, Switzerland, 2020; pp. 247–263. [Google Scholar]
- Wang, X.; Lian, L.; Miao, Z.; Liu, Z.; Yu, S.X. Long-tailed Recognition by Routing Diverse Distribution-Aware Experts. In Proceedings of the International Conference on Learning Representations, Virtual, 3–7 May 2021. [Google Scholar]
- He, Y.Y.; Wu, J.; Wei, X.S. Distilling virtual examples for long-tailed recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 235–244. [Google Scholar]
- Wei, C.; Sohn, K.; Mellina, C.; Yuille, A.; Yang, F. Crest: A class-rebalancing self-training framework for imbalanced semi-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10857–10866. [Google Scholar]
- Zhang, C.; Pan, T.Y.; Li, Y.; Hu, H.; Xuan, D.; Changpinyo, S.; Gong, B.; Chao, W.-L. MosaicOS: A simple and effective use of object-centric images for long-tailed object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 417–427. [Google Scholar]
- Guo, H.; Wang, S. Long-tailed multi-label visual recognition by collaborative training on uniform and re-balanced samplings. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 15089–15098. [Google Scholar]
- Cai, J.; Wang, Y.; Hwang, J.N. Ace: Ally complementary experts for solving long-tailed recognition in one-shot. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 112–121. [Google Scholar]
- Cui, J.; Liu, S.; Tian, Z.; Zhong, Z.; Jia, J. Reslt: Residual learning for long-tailed recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 3695–3706. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Hooi, B.; Hong, L.; Feng, J. Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision. arXiv 2021, arXiv:2107.09249. [Google Scholar]
- Tang, K.; Huang, J.; Zhang, H. Long-tailed classification by keeping the good and removing the bad momentum causal effect. Adv. Neural Inf. Process. Syst. 2020, 33, 1513–1524. [Google Scholar]
- Zhou, B.; Lapedriza, A.; Khosla, A.; Oliva, A.; Torralba, A. Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 1452–1464. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhu, L.; Yang, Y. Inflated episodic memory with region self-attention for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 4344–4353. [Google Scholar]
- Kang, B.; Li, Y.; Xie, S.; Feng, J. Exploring balanced feature spaces for representation learning. In Proceedings of the International Conference on Learning Representations, Virtual, 3–7 May 2021. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning (PMLR), Virtual, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
- Jia, C.; Yang, Y.; Xia, Y.; Chen, Y.-T.; Parekh, Z.; Pham, H.; Le, Q.V.; Sung, Y.; Li, Z.; Duerig, T. Scaling up visual and vision-language representation learning with noisy text supervision. In Proceedings of the International Conference on Machine Learning (PMLR), Virtual, 18–24 July 2021; pp. 4904–4916. [Google Scholar]
- Cubuk, E.D.; Zoph, B.; Shlens, J.; Le, Q.V. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 702–703. [Google Scholar]
- Cui, J.; Zhong, Z.; Liu, S.; Yu, B.; Jia, J. Parametric contrastive learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 715–724. [Google Scholar]
- Gururangan, S.; Marasović, A.; Swayamdipta, S.; Lo, K.; Beltagy, I.; Downey, D.; Smith, N.A. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks. arXiv 2020, arXiv:2004.10964. [Google Scholar]
- Liu, Z.; Miao, Z.; Zhan, X.; Wang, J.; Gong, B.; Yu, S.X. Large-scale long-tailed recognition in an open world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2537–2546. [Google Scholar]
- Ma, T.; Geng, S.; Wang, M.; Shao, J.; Lu, J.; Li, H.; Gao, P.; Qiao, Y. A Simple Long-Tailed Recognition Baseline via Vision-Language Model. arXiv 2021, arXiv:2111.14745. [Google Scholar]
Method | Advantages | Limitations |
---|---|---|
Data Re-Sampling | To some extent, it can reduce the imbalance between head and tail classes. | Causes overfitting of the tail class. Causes underfitting of the head class. |
Data Re-Weighting | Assigning weights to different classes and aggravating the learning of tail class. | It is difficult to choose appropriate weights for each class. There may be big differences for different long-tailed datasets. |
Data Augmentation | The tail data are extend by data augmentation. | Inability to introduce new effective samples. |
Transfer Learning | Transfer the knowledge of head class to tail class. | Requiring a more complex model or module design, which can make the model difficult to train. |
Ensemble Learning | Multi-expert model ensembling. | Expert model ensemble requires more computational resources. |
Name | Model/Parameter |
---|---|
CPU | Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40 GHz |
GPU | NVIDIA A100 40 G × 8 |
Memory | 128 G |
Hard disk | 1 T |
Operating system | Ubuntu20.04 |
CUDA | CUDA Version 10.1 |
Deep learning framework | Pytorch 1.7.1 |
Development language | Python 3.7 |
Model | Backbone | Accuracy | F1 | |||
---|---|---|---|---|---|---|
Head | Medium | Tail | All | |||
OLTR [63] | ResNet-32 | 61.8% | 41.4% | 17.6% | 41.2% | 52.3% |
LDAM [22] | ResNet-32 | 61.5% | 41.7% | 20.2% | 42.0% | 52.9% |
cRT [4] | ResNet-32 | 64.0% | 44.8% | 18.1% | 43.3% | 51.9% |
RIDE [46] | ResNet-32 | 69.3% | 49.3% | 26.0% | 49.1% | 57.3% |
TADE [53] | ResNet-32 | 65.4% | 49.3% | 29.3% | 49.8% | 58.8% |
BALLAD [64] | ResNet-50 | 62.4% | 52.3% | 38.2% | 51.6% | 62.1% |
ECVL | ResNet-50 | 65.0% | 57.2% | 46.5% | 55.8% | 70.6% |
Model | Backbone | Accuracy | F1 | |||
---|---|---|---|---|---|---|
Head | Medium | Tail | All | |||
OLTR [63] | ResNeXt-50 | 43.2% | 35.1% | 18.5% | 35.6% | 47.6% |
cRT [4] | ResNeXt-50 | 61.8% | 46.2% | 27.4% | 49.6% | 53.7% |
LWS [4] | ResNeXt-152 | 62.2% | 50.1% | 35.8% | 52.8% | - |
ResNeXt-50 | 60.2% | 47.2% | 30.3% | 49.9% | 50.6% | |
ResLT [52] | ResNeXt-152 | 63.5% | 50.4% | 34.2% | 53.3% | - |
ResNeXt-50 | 63.0% | 50.5% | 35.5% | 52.9% | 55.2% | |
Balanced Softmax [28] | ResNeXt-101 | 63.3% | 53.3% | 40.3% | 55.1% | - |
ResNet-50 | 66.7% | 52.9% | 33.0% | 55.0% | - | |
PaCo [61] | ResNeXt-50 | 67.7% | 53.8% | 34.2% | 56.2% | - |
ResNet-50 | 65.0% | 55.7% | 38.2% | 57.0% | 62.3% | |
BALLAD [64] | ResNeXt-50 | 67.5% | 56.9% | 36.7% | 58.2% | - |
ResNet-50 | 71.0% | 66.3% | 59.5% | 67.2% | 66.0% | |
ECVL | ResNet-50 | 73.2% | 69.8% | 67.4% | 70.6% | 77.2% |
Model | Backbone | Accuracy | F1 | |||
---|---|---|---|---|---|---|
Head | Medium | Tail | All | |||
OLTR [63] | ResNet-152 | 44.7% | 37.0% | 25.3% | 35.9% | 46.4% |
cRT [4] | ResNet-152 | 42.0% | 37.6% | 24.9% | 36.7% | 45.5% |
LWS [4] | ResNet-152 | 40.6% | 39.1% | 28.6% | 37.6% | 46.2% |
ResLT [52] | ResNet-152 | 39.8% | 43.6% | 31.4% | 39.8% | 51.2% |
PaCo [61] | ResNet-50 | 37.5% | 47.2% | 33.9% | 41.2% | 52.3% |
BALLAD [64] | ResNet-50 | 46.7% | 48.0% | 42.7% | 46.5% | 56.8% |
ResNet-101 | 48.0% | 48.6% | 46.0% | 47.9% | - | |
ViT-B/16 | 49.3% | 50.2% | 48.4% | 49.5% | - | |
ECVL | ResNet-50 | 48.6% | 48.3% | 44.0% | 47.2% | 59.6% |
Module | Accuracy | F1 | |||
---|---|---|---|---|---|
Head | Medium | Tail | All | ||
no momentum contrast loss + no random augment | 62.4% | 52.3% | 38.2% | 51.6% | 62.1% |
momentum contrast loss | 62.5% | 53.3% | 40.1% | 52.4% | 65.8% |
momentum contrast loss + random augment | 65.0% | 57.2% | 46.5% | 55.8% | 70.6% |
Module | Accuracy | F1 | |||
---|---|---|---|---|---|
Head | Medium | Tail | All | ||
no momentum contrast loss + no random augment | 71.0% | 66.3% | 59.5% | 67.2% | 66.0% |
momentum contrast loss | 72.5% | 68.7% | 63.2% | 69.4% | 70.8% |
momentum contrast loss + random augment | 73.2% | 69.9% | 67.4% | 70.6% | 77.2% |
Module | Accuracy | F1 | |||
---|---|---|---|---|---|
Head | Medium | Tail | All | ||
no momentum contrast loss + no random augment | 46.7% | 48.0% | 42.7% | 46.2% | 56.8% |
momentum contrast loss | 47.0% | 47.5% | 43.2% | 46.5% | 58.3% |
momentum contrast loss + random augment | 48.6% | 48.3% | 44.0% | 47.2% | 59.6% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Song, Y.; Li, M.; Wang, B. A Long-Tailed Image Classification Method Based on Enhanced Contrastive Visual Language. Sensors 2023, 23, 6694. https://doi.org/10.3390/s23156694
Song Y, Li M, Wang B. A Long-Tailed Image Classification Method Based on Enhanced Contrastive Visual Language. Sensors. 2023; 23(15):6694. https://doi.org/10.3390/s23156694
Chicago/Turabian StyleSong, Ying, Mengxing Li, and Bo Wang. 2023. "A Long-Tailed Image Classification Method Based on Enhanced Contrastive Visual Language" Sensors 23, no. 15: 6694. https://doi.org/10.3390/s23156694
APA StyleSong, Y., Li, M., & Wang, B. (2023). A Long-Tailed Image Classification Method Based on Enhanced Contrastive Visual Language. Sensors, 23(15), 6694. https://doi.org/10.3390/s23156694