Hierarchical Deep Feature Fusion and Ensemble Learning for Enhanced Brain Tumor MRI Classification
Abstract
1. Introduction
- We introduced a hybrid approach combining a feature-level ensemble of pre-trained ViTs models with classifier-level ensembling of fine-tuned ML classifiers, significantly enhancing brain tumor classification accuracy.
- We implemented extensive preprocessing techniques and data augmentation strategies to improve data quality and address challenges such as noise and variability in MRI datasets.
- We conducted systematic hyperparameter tuning for multiple ML classifiers, demonstrating the pivotal role of this process in achieving superior performance and diagnostic reliability.
- We validated the proposed framework on two publicly available MRI brain tumor datasets from Kaggle, achieving SOTA performance and showcasing the effectiveness of integrating DL and ML models for medical image analysis.
2. Related Work
2.1. Brain Tumor Classification Using Traditional ML
2.2. Brain Tumor Classification Using DL
2.3. Transformer-Based Models
2.3.1. Hybrid CNN–ViT Models
2.3.2. Brain Tumor MRI (Classification/Segmentation)
2.4. Summary
3. Proposed Methodology
3.1. Overview
3.2. Model Selection and Fusion Rationale
3.3. Datasets
3.4. Pre-Processing
3.5. Deep Feature Extraction Using Pre-Trained Visions Transformers
3.6. Brain Tumor Classification Using ML Classifiers
3.6.1. MultiLayer Perceptron
3.6.2. Gaussian Naive Bayes
3.6.3. AdaBoost
3.6.4. K-Nearest Neighbors
3.6.5. Random Forest
3.6.6. Support Vector Machine
3.7. HyperParameter Tuning for ML Models
4. Experimental Setup
4.1. Implementation Details
4.1.1. Deep Feature Evaluation and Selection
4.1.2. Evaluation Criteria
4.1.3. Rationale for Selection
4.1.4. Ensemble of Deep Features
4.2. Feature Ensemble vs. Classifier Ensemble
4.3. Confidence Interval Estimation
4.4. Feature Ensemble Methodology
4.4.1. Classifier Ensemble Methodology
4.4.2. Insights and Advantages
5. Experimental Results
5.1. Experiment 1: Hyperparameter Tuning of ML Classifiers
5.2. Experiment 2: Using an Ensemble of Deep Features with ML Classifiers
5.2.1. Feature Ensemble Creation
5.2.2. Performance Comparison
5.3. Experiment 3: Using an Ensemble of ML Classifiers with Various Preprocessing Strategies
5.4. Impact of Preprocessing on Classification Performance
6. Discussion
Limitations and Future Work
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zeng, L.; Zhang, H.H. Robust brain MRI image classification with SIBOW-SVM. Comput. Med. Imaging Graph. 2024, 118, 102451. [Google Scholar] [CrossRef]
- da Costa Nascimento, J.J.; Marques, A.G.; do Nascimento Souza, L.; de Mattos Dourado, C.M.J.; da Silva Barros, A.C.; de Albuquerque, V.H.C.; de Freitas Sousa, L.F. A novel generative model for brain tumor detection using magnetic resonance imaging. Comput. Med. Imaging Graph. 2025, 121, 102498. [Google Scholar] [CrossRef]
- Musthafa, M.M.; Kumar V, V.; Guluwadi, S. Enhancing brain tumor detection in MRI images through explainable AI using Grad-CAM with Resnet 50. BMC Med. Imaging 2024, 24, 107. [Google Scholar]
- Lei, J.; Dai, L.; Jiang, H.; Wu, C.; Zhang, X.; Zhang, Y.; Yao, J.; Xie, W.; Zhang, Y.; Li, Y.; et al. Unibrain: Universal brain mri diagnosis with hierarchical knowledge-enhanced pre-training. Comput. Med. Imaging Graph. 2025, 122, 102516. [Google Scholar] [CrossRef] [PubMed]
- Aygün, M.; Şahin, Y.H.; Ünal, G. Multi modal convolutional neural networks for brain tumor segmentation. arXiv 2018, arXiv:1809.06191. [Google Scholar] [CrossRef]
- Dehkordi, A.A.; Hashemi, M.; Neshat, M.; Mirjalili, S.; Sadiq, A.S. Brain tumor detection and classification using a new evolutionary convolutional neural network. arXiv 2022, arXiv:2204.12297. [Google Scholar] [CrossRef]
- Gundogan, E. A Novel Hybrid Deep Learning Model Enhanced with Explainable AI for Brain Tumor Multi-Classification from MRI Images. Appl. Sci. 2025, 15, 5412. [Google Scholar] [CrossRef]
- Abd-Ellah, M.K.; Awad, A.I.; Khalaf, A.A.; Hamed, H.F. A review on brain tumor diagnosis from MRI images: Practical implications, key achievements, and lessons learned. Magn. Reson. Imaging 2019, 61, 300–318. [Google Scholar] [CrossRef]
- Magadza, T.; Viriri, S. Deep learning for brain tumor segmentation: A survey of state-of-the-art. J. Imaging 2021, 7, 19. [Google Scholar] [CrossRef]
- Madgi, M.; Giraddi, S.; Bharamagoudar, G.; Madhur, M. Brain tumor classification and segmentation using deep learning. In Smart Computing Techniques and Applications: Proceedings of the Fourth International Conference on Smart Computing and Informatics; Springer: Berlin/Heidelberg, Germany, 2021; Volume 2, pp. 201–208. [Google Scholar]
- Nadeem, M.W.; Ghamdi, M.A.A.; Hussain, M.; Khan, M.A.; Khan, K.M.; Almotiri, S.H.; Butt, S.A. Brain tumor analysis empowered with deep learning: A review, taxonomy, and future challenges. Brain Sci. 2020, 10, 118. [Google Scholar] [CrossRef]
- Faradibah, A.; Widyawati, D.; Syahar, A.U.T.; Jabir, S.R.; Belluano, P.L.L. Comparison analysis of random forest classifier, support vector machine, and artificial neural network performance in multiclass brain tumor classification. Indones. J. Data Sci. 2023, 4, 55–63. [Google Scholar] [CrossRef]
- Latif, G.; Ben Brahim, G.; Iskandar, D.A.; Bashar, A.; Alghazo, J. Glioma Tumors’ classification using deep-neural-network-based features with SVM classifier. Diagnostics 2022, 12, 1018. [Google Scholar] [CrossRef]
- Ahmad, S.; Choudhury, P.K. On the performance of deep transfer learning networks for brain tumor detection using MR images. IEEE Access 2022, 10, 59099–59114. [Google Scholar] [CrossRef]
- Takahashi, S.; Sakaguchi, Y.; Kouno, N.; Takasawa, K.; Ishizu, K.; Akagi, Y.; Aoyama, R.; Teraya, N.; Bolatkan, A.; Shinkai, N.; et al. Comparison of vision transformers and convolutional neural networks in medical image analysis: A systematic review. J. Med. Syst. 2024, 48, 84. [Google Scholar] [CrossRef] [PubMed]
- Matsoukas, C.; Haslum, J.F.; Söderberg, M.; Smith, K. Pretrained vits yield versatile representations for medical images. arXiv 2023, arXiv:2303.07034. [Google Scholar] [CrossRef]
- Dahan, S.; Fawaz, A.; Williams, L.Z.; Yang, C.; Coalson, T.S.; Glasser, M.F.; Edwards, A.D.; Rueckert, D.; Robinson, E.C. Surface vision transformers: Attention-based modelling applied to cortical analysis. In Proceedings of the International Conference on Medical Imaging with Deep Learning, PMLR, Zurich, Switzerland, 6–8 July 2022; pp. 282–303. [Google Scholar]
- Feng, C.M.; Yan, Y.; Chen, G.; Xu, Y.; Hu, Y.; Shao, L.; Fu, H. Multimodal transformer for accelerated MR imaging. IEEE Trans. Med. Imaging 2022, 42, 2804–2816. [Google Scholar] [CrossRef]
- Dahan, S.; Williams, L.Z.; Rueckert, D.; Robinson, E.C. The multiscale surface vision transformer. arXiv 2024, arXiv:2303.11909v3. [Google Scholar]
- Thakur, G.K.; Thakur, A.; Kulkarni, S.; Khan, N.; Khan, S. Deep learning approaches for medical image analysis and diagnosis. Cureus 2024, 16, e59507. [Google Scholar] [CrossRef]
- Babayomi, M.; Olagbaju, O.A.; Kadiri, A.A. Convolutional xgboost (c-xgboost) model for brain tumor detection. arXiv 2023, arXiv:2301.02317. [Google Scholar] [CrossRef]
- Zhu, G.; Jiang, B.; Tong, L.; Xie, Y.; Zaharchuk, G.; Wintermark, M. Applications of deep learning to neuro-imaging techniques. Front. Neurol. 2019, 10, 869. [Google Scholar] [CrossRef]
- Azizova, A.; Prysiazhniuk, Y.; Wamelink, I.J.; Cakmak, M.; Kaya, E.; Wesseling, P.; de Witt Hamer, P.C.; Verburg, N.; Petr, J.; Barkhof, F.; et al. Preoperative prediction of diffuse glioma type and grade in adults: A gadolinium-free MRI-based decision tree. Eur. Radiol. 2025, 35, 1242–1254. [Google Scholar] [CrossRef]
- Al Yassin, A.; Sadaghiani, M.S.; Mohan, S.; Bryan, R.N.; Nasrallah, I. It is About “Time”: Academic Neuroradiologist Time Distribution for Interpreting Brain MRIs. Acad. Radiol. 2018, 25, 1521–1525. [Google Scholar] [CrossRef]
- Sieber, V.; Rusche, T.; Yang, S.; Stieltjes, B.; Fischer, U.; Trebeschi, S.; Cattin, P.; Nguyen-Kim, D.L.; Psychogios, M.N.; Lieb, J.M.; et al. Automated assessment of brain MRIs in multiple sclerosis patients significantly reduces reading time. Neuroradiology 2024, 66, 2171–2176. [Google Scholar] [CrossRef]
- Aamir, M.; Rahman, Z.; Bhatti, U.A.; Abro, W.A.; Bhutto, J.A.; He, Z. An automated deep learning framework for brain tumor classification using MRI imagery. Sci. Rep. 2025, 15, 17593. [Google Scholar] [CrossRef] [PubMed]
- Kong, C.; Yan, D.; Liu, K.; Yin, Y.; Ma, C. Multiple deep learning models based on MRI images in discriminating glioblastoma from solitary brain metastases: A multicentre study. BMC Med. Imaging 2025, 25, 171. [Google Scholar] [CrossRef] [PubMed]
- Ural, B. A computer-based brain tumor detection approach with advanced image processing and probabilistic neural network methods. J. Med. Biol. Eng. 2018, 38, 867–879. [Google Scholar] [CrossRef]
- Ullah, Z.; Farooq, M.U.; Lee, S.H.; An, D. A hybrid image enhancement based brain MRI images classification technique. Med. Hypotheses 2020, 143, 109922. [Google Scholar] [CrossRef]
- Varuna Shree, N.; Kumar, T. Identification and classification of brain tumor MRI images with feature extraction using DWT and probabilistic neural network. Brain Inform. 2018, 5, 23–30. [Google Scholar] [CrossRef]
- Kharrat, A.; Gasmi, K.; Messaoud, M.B.; Benamrane, N.; Abid, M. A hybrid approach for automatic classification of brain MRI using genetic algorithm and support vector machine. Leonardo J. Sci. 2010, 17, 71–82. [Google Scholar]
- Rajan, P.; Sundar, C. Brain tumor detection and segmentation by intensity adjustment. J. Med. Syst. 2019, 43, 282. [Google Scholar] [CrossRef]
- Çinar, A.; Yildirim, M. Detection of tumors on brain MRI images using the hybrid convolutional neural network architecture. Med. Hypotheses 2020, 139, 109684. [Google Scholar] [CrossRef]
- Mehnatkesh, H.; Jalali, S.M.J.; Khosravi, A.; Nahavandi, S. An intelligent driven deep residual learning framework for brain tumor classification using MRI images. Expert Syst. Appl. 2023, 213, 119087. [Google Scholar] [CrossRef]
- Deepak, S.; Ameer, P. Brain tumor classification using deep CNN features via transfer learning. Comput. Biol. Med. 2019, 111, 103345. [Google Scholar] [CrossRef] [PubMed]
- Díaz-Pernas, F.J.; Martínez-Zarzuela, M.; Antón-Rodríguez, M.; González-Ortega, D. A deep learning approach for brain tumor classification and segmentation using a multiscale convolutional neural network. Healthcare 2021, 9, 153. [Google Scholar] [CrossRef] [PubMed]
- Khan, M.S.I.; Rahman, A.; Debnath, T.; Karim, M.R.; Nasir, M.K.; Band, S.S.; Mosavi, A.; Dehzangi, I. Accurate brain tumor detection using deep convolutional neural network. Comput. Struct. Biotechnol. J. 2022, 20, 4733–4745. [Google Scholar] [CrossRef] [PubMed]
- Paul, J.S.; Plassard, A.J.; Landman, B.A.; Fabbri, D. Deep learning for brain tumor classification. In Medical Imaging 2017: Biomedical Applications in Molecular, Structural, and Functional Imaging; SPIE: Bellingham, WA, USA, 2017; Volume 10137, pp. 253–268. [Google Scholar]
- Hemanth, D.J.; Anitha, J.; Naaji, A.; Geman, O.; Popescu, D.E.; Son, L.H. A modified deep convolutional neural network for abnormal brain image classification. IEEE Access 2018, 7, 4275–4283. [Google Scholar] [CrossRef]
- Shen, Y.; Guo, P.; Wu, J.; Huang, Q.; Le, N.; Zhou, J.; Jiang, S.; Unberath, M. Movit: Memorizing vision transformers for medical image analysis. In Proceedings of the International Workshop on Machine Learning in Medical Imaging, Vancouver, BC, Canada, 8 October 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 205–213. [Google Scholar]
- Xia, K.; Wang, J. Recent advances of transformers in medical image analysis: A comprehensive review. MedComm–Future Med. 2023, 2, e38. [Google Scholar] [CrossRef]
- Henry, E.U.; Emebob, O.; Omonhinmin, C.A. Vision transformers in medical imaging: A review. arXiv 2022, arXiv:2211.10043. [Google Scholar] [CrossRef]
- Kang, J.; Ullah, Z.; Gwak, J. MRI-based brain tumor classification using ensemble of deep features and machine learning classifiers. Sensors 2021, 21, 2222. [Google Scholar] [CrossRef]
- Ahmed, M.M.; Hossain, M.M.; Islam, M.R.; Ali, M.S.; Nafi, A.A.N.; Ahmed, M.F.; Ahmed, K.M.; Miah, M.S.; Rahman, M.M.; Niu, M.; et al. Brain tumor detection and classification in MRI using hybrid ViT and GRU model with explainable AI in Southern Bangladesh. Sci. Rep. 2024, 14, 22797. [Google Scholar] [CrossRef]
- Hamada, A. Br35H Brain Tumor Detection 2020 Dataset. 2020. Available online: https://www.kaggle.com/datasets/ahmedhamada0/brain-tumor-detection (accessed on 1 August 2020).
- Chakrabarty, N. Brain MRI Images for Brain Tumor Detection. 2019. Available online: https://www.kaggle.com/datasets/navoneel/brain-mri-images-for-brain-tumor-detection (accessed on 1 August 2025).
- Shamshad, F.; Khan, S.; Zamir, S.W.; Khan, M.H.; Hayat, M.; Khan, F.S.; Fu, H. Transformers in medical imaging: A survey. Med. Image Anal. 2023, 88, 102802. [Google Scholar] [CrossRef]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar] [CrossRef]
- Hatamizadeh, A.; Tang, Y.; Nath, V.; Yang, D.; Myronenko, A.; Landman, B.; Roth, H.R.; Xu, D. Unetr: Transformers for 3d medical image segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 574–584. [Google Scholar]
- Chaoyang, Z.; Shibao, S.; Wenmao, H.; Pengcheng, Z. FDR-TransUNet: A novel encoder-decoder architecture with vision transformer for improved medical image segmentation. Comput. Biol. Med. 2024, 169, 107858. [Google Scholar] [CrossRef] [PubMed]
- Sun, G.; Pan, Y.; Kong, W.; Xu, Z.; Ma, J.; Racharak, T.; Nguyen, L.M.; Xin, J. DA-TransUNet: Integrating spatial and channel dual attention with transformer U-net for medical image segmentation. Front. Bioeng. Biotechnol. 2024, 12, 1398237. [Google Scholar] [CrossRef] [PubMed]
- Asiri, A.A.; Shaf, A.; Ali, T.; Shakeel, U.; Irfan, M.; Mehdar, K.M.; Halawani, H.T.; Alghamdi, A.H.; Alshamrani, A.F.A.; Alqhtani, S.M. Exploring the power of deep learning: Fine-tuned vision transformer for accurate and efficient brain tumor detection in MRI scans. Diagnostics 2023, 13, 2094. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.; Lu, S.Y.; Wang, S.H.; Zhang, Y.D. RanMerFormer: Randomized vision transformer with token merging for brain tumor classification. Neurocomputing 2024, 573, 127216. [Google Scholar] [CrossRef]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Finding Extreme Points in Contours with OpenCV. PyImageSearch. 2020. Available online: https://www.pyimagesearch.com/2016/04/11/finding-extreme-points-in-contours-with-opencv (accessed on 10 August 2020).
- Perez, L.; Wang, J. The effectiveness of data augmentation in image classification using deep learning. arXiv 2017, arXiv:1712.04621. [Google Scholar] [CrossRef]
- Yang, S.; Xiao, W.; Zhang, M.; Guo, S.; Zhao, J.; Shen, F. Image data augmentation for deep learning: A survey. arXiv 2022, arXiv:2204.08610. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Wu, B.; Xu, C.; Dai, X.; Wan, A.; Zhang, P.; Yan, Z.; Tomizuka, M.; Gonzalez, J.; Keutzer, K.; Vajda, P. Visual transformers: Token-based image representation and processing for computer vision. arXiv 2020, arXiv:2006.03677. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 248–255. [Google Scholar]
- Erickson, B.J.; Korfiatis, P.; Akkus, Z.; Kline, T.L. Machine learning for medical imaging. Radiographics 2017, 37, 505–515. [Google Scholar] [CrossRef]
- Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Yu, T.; Zhu, H. Hyper-parameter optimization: A review of algorithms and applications. arXiv 2020, arXiv:2003.05689. [Google Scholar] [CrossRef]
- Tran, N.; Schneider, J.G.; Weber, I.; Qin, A.K. Hyper-parameter optimization in classification: To-do or not-to-do. Pattern Recognit. 2020, 103, 107245. [Google Scholar] [CrossRef]
- Claesen, M.; De Moor, B. Hyperparameter search in machine learning. arXiv 2015, arXiv:1502.02127. [Google Scholar] [CrossRef]
- Belete, D.M.; Huchaiah, M.D. Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results. Int. J. Comput. Appl. 2022, 44, 875–886. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; Volume 25. [Google Scholar]
Ref. | Solution | AI Model | Objective | Dataset Size | Feature Extraction/Preprocessing | Accuracy (%) | Remarks |
---|---|---|---|---|---|---|---|
[28] | Classical ML | PNN | Brain tumor detection | 25 MRI samples | k-means with fuzzy c-means | 90.0 | Small dataset; limited generalizability |
[29] | Classical ML | Feed-forward NN | Classification into normal and abnormal | 71 MRI samples | DWT | 95.8 | Dataset different from PNN, not directly comparable |
[30] | Classical ML | Probabilistic NN | Classification into normal and abnormal | 650 MRI samples | GLCM | 95.0 | Larger dataset; preprocessing details not fully reported |
[31] | Hybrid ML | Genetic Algorithm + SVM | Classification into normal and abnormal | 83 MRI samples | Wavelet-based features | 98.14 | Dataset and preprocessing differ from others |
[32] | Classical ML | SVM | Tumor detection | 41 MRI samples | Adaptive GLCM | 98.0 | Different dataset; comparison indicative only |
[33] | Deep Learning | CNN models | Detection and classification | 253 MRI samples | CNN | 97.2 | Dataset relatively small; different from others |
[34] | Deep Learning | ResNet | Classification | 3064 MRI samples | CNN | 98.69 | Large dataset; results comparable with other CNN-based methods |
[35] | Deep Learning | Transfer Learning | Classification | 3064 MRI samples | GoogleNet | 99.4 | Same dataset as ResNet; fair comparison |
[36] | Deep Learning | CNN | Detection and classification | 3064 MRI samples | CNN | 97.8 | Same dataset as ResNet; comparable |
[37] | Deep Learning | CNN | Detection and classification | 3064 MRI samples | CNN | 97.9 | Same dataset as ResNet; comparable |
[38] | Deep Learning | CNN | Detection and classification | 3064 MRI samples | CNN | 91.43 | Same dataset as ResNet; comparable |
[39] | Deep Learning | CNN | Classification | 220 MRI samples | CNN | 94.5 | Dataset much smaller; comparison indicative only |
Types | Number of Classes | Training Set | Test Set |
---|---|---|---|
BT-small-2c | 2 | 202 | 51 |
BT-large-2c | 2 | 2400 | 600 |
Model | Hyperparameter | Search Space | Type |
---|---|---|---|
XGBoost | max_depth learning_rate subsample n_estimators | [3, 5, 7], [0.1, 0.01, 0.001] [0.5, 0.7, 1] [100, 200, 300] | Discrete Continuous Continuous Discrete |
MLP | hidden_layer_sizes activation solver max_iter momentum | [(50,), (100, 22), (100, 100, 50), (100, 50, 36, 30), (100, 100, 200, 150, 100)] [relu, tanh, logistic] [adam, sgd, lbfgs] [1000] [0.9, 0.95, 0.99] | Discrete Categorical Discrete Continuous |
Gaussian NB | var_smoothing priors | [1 , 1 , 1 , 1 , 1 ] [None, [0.3, 0.7], [0.4, 0.6], [0.5, 0.5]] | Continuous Continuous |
Adaboost | n_estimators learning_rate | [50, 70, 90, 120, 180, 200] [0.001, 0.01, 0.1, 1, 10] | |
KNN | n_neighbors weights algorithm leaf_size p metric n_jobs | list(range(1, 31)) [uniform, distance] [autom ball_tree, kd_tree, brute] list(range(10, 51, 5)) [1, 2] [euclidean, manhattan, minkowski] [] | Discrete Categorical Discrete Discrete Categorical Discrete |
RF | n_estimators max_depth min_samples_split min_samples_leaf max_features bootstrap criterion oob_score random_state | [100, 200, 300, 400, 500] [None, 10, 20, 30, 40, 50] [2, 5, 10] [1, 2, 4] [auto, sqrt, log2] [True, False] [gini, entropy] [True, False] [42] | Discrete Discrete Discrete Discrete Categorical |
SVM_linear | C kernel tol class_weight random_state | [0.1, 1, 10, 100, 1000] [linear] [1 , 1 , 1 ] [None, balanced] [42] | Continuous Categorical Discrete |
SVM_sigmoid | kernel C gamma coef0 tol class_weight shrinking probability cache_size random_state | sigmoid [0.1, 1, 10, 100] [scale, auto] [0.0, 0.1, 0.5, 1.0] [1 , 1 , 1 ] [None, balanced] [True, False] [True, False] [200.0, 500.0, 100.0] [42] | Continuous Categorical Continuous |
SVM_RBF | C gamma kernel class_weight shrinking probability tol cache_size max_iter | [0.1, 1, 10, 100] [scale, auto, 0.1, 1, 10] [rbf] [None, balanced] [True, False] [True, False] [1 , 1 ] [200, 500, 1000] [, 1000, 5000] | Discrete Categorical Discrete |
Kernel | Equation | Parameters |
---|---|---|
Linear | - | |
Sigmoid | ||
RBF |
Deep Feature from the Pre-Trained ViT Models | ML Classifier Accuracy | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
XGBoost | MLP | GaussianNB | Adaboost | KNN | RFClassifier | SVM_linear | SVM_sigmoid | SVM_RBF | Average | |
vit_base_patch16_224 | 0.9001 | 0.9800 | 0.8515 | 0.9731 | 0.9652 | 0.9752 | 0.9705 | 0.9200 | 0.9712 | 0.9452 |
vit_base_patch32_224 | 0.8832 | 0.9771 | 0.8600 | 0.9785 | 0.9532 | 0.9652 | 0.9800 | 0.9441 | 0.9804 | 0.9469 |
vit_large_patch16_224 | 0.9221 | 0.9851 | 0.8536 | 0.9855 | 0.9702 | 0.9632 | 0.9831 | 0.9591 | 0.9732 | 0.9550 |
vit_small_patch32_224 | 0.8911 | 0.9728 | 0.8752 | 0.9623 | 0.9723 | 0.9501 | 0.9506 | 0.9199 | 0.9800 | 0.9416 |
deit3_small_patch16_224 | 0.8544 | 0.9900 | 0.7857 | 0.9513 | 0.9602 | 0.9451 | 0.9401 | 0.8408 | 0.9766 | 0.9160 |
vit_base_patch8_224 | 0.9121 | 0.9917 | 0.8469 | 0.9641 | 0.9532 | 0.9502 | 0.9700 | 0.8600 | 0.9802 | 0.9365 |
vit_tiny_patch16_224 | 0.9016 | 0.9850 | 0.8321 | 0.9607 | 0.9700 | 0.9500 | 0.9415 | 0.8722 | 0.9612 | 0.9305 |
vit_small_patch16_224 | 0.9024 | 0.9900 | 0.8400 | 0.9802 | 0.9739 | 0.9700 | 0.9602 | 0.9192 | 0.9709 | 0.9452 |
vit_base_patch16_384 | 0.9132 | 0.9680 | 0.8768 | 0.9700 | 0.9621 | 0.9699 | 0.9631 | 0.9504 | 0.9700 | 0.9493 |
vit_tiny_patch16_384 | 0.9056 | 0.9666 | 0.7854 | 0.9703 | 0.9700 | 0.9708 | 0.9523 | 0.8700 | 0.9904 | 0.9313 |
vit_small_patch32_384 | 0.9214 | 0.9835 | 0.8899 | 0.9733 | 0.9833 | 0.9667 | 0.9612 | 0.9212 | 0.9808 | 0.9535 |
vit_small_patch16_384 | 0.8932 | 0.9627 | 0.8244 | 0.9621 | 0.9801 | 0.9700 | 0.9510 | 0.9219 | 0.9803 | 0.9384 |
vit_base_patch32_384 | 0.9021 | 0.9732 | 0.8241 | 0.9800 | 0.9800 | 0.9623 | 0.9701 | 0.9505 | 0.9612 | 0.9448 |
Average | 0.9002 | 0.9789 | 0.8420 | 0.9701 | 0.9687 | 0.9622 | 0.9611 | 0.9115 | 0.9751 |
Deep Feature from the Pre-Trained ViT Model | ML Classifier Accuracy | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
XGBoost | MLP | GaussianNB | Adaboost | KNN | RFClassifier | SVM_linear | SVM_sigmoid | SVM_RBF | Average | |
vit_base_patch16_224 | 0.9483 | 0.9917 | 0.865 | 0.9817 | 0.9867 | 0.98 | 0.9917 | 0.9383 | 0.99 | 0.9637 |
vit_base_patch32_224 ★ | 0.9517 | 0.995 | 0.8767 | 0.9883 | 0.9883 | 0.9783 | 0.9917 | 0.9633 | 0.995 | 0.9698 |
vit_large_patch16_224 ★ | 0.96 | 0.995 | 0.8683 | 0.9933 | 0.985 | 0.9833 | 0.9967 | 0.9833 | 0.9967 | 0.9735 |
vit_small_patch32_224 | 0.93 | 0.9867 | 0.8917 | 0.9833 | 0.9933 | 0.9683 | 0.9717 | 0.9367 | 0.995 | 0.9619 |
deit3_small_patch16_224 | 0.8717 | 0.9867 | 0.7983 | 0.96 | 0.97 | 0.9517 | 0.955 | 0.855 | 0.99 | 0.9265 |
vit_base_patch8_224 | 0.94 | 0.995 | 0.855 | 0.9833 | 0.985 | 0.9733 | 0.995 | 0.8817 | 0.9933 | 0.9557 |
vit_tiny_patch16_224 | 0.91 | 0.9867 | 0.865 | 0.9767 | 0.985 | 0.9717 | 0.9533 | 0.895 | 0.9867 | 0.9478 |
vit_small_patch16_224 | 0.94 | 0.9917 | 0.8583 | 0.9917 | 0.985 | 0.98 | 0.975 | 0.9383 | 0.9933 | 0.9615 |
vit_base_patch16_384 | 0.94 | 0.9883 | 0.895 | 0.9867 | 0.9867 | 0.9783 | 0.985 | 0.9633 | 0.985 | 0.9676 |
vit_tiny_patch16_384 | 0.9167 | 0.9917 | 0.8083 | 0.98 | 0.9883 | 0.9833 | 0.9733 | 0.8983 | 0.9933 | 0.9481 |
vit_small_patch32_384 | 0.9433 | 0.99 | 0.9067 | 0.9883 | 0.99 | 0.9867 | 0.975 | 0.9483 | 0.9967 | 0.9694 |
vit_small_patch16_384 | 0.9367 | 0.9883 | 0.8333 | 0.9817 | 0.99 | 0.99 | 0.975 | 0.945 | 0.9917 | 0.9591 |
vit_base_patch32_384 ★ | 0.9517 | 0.9933 | 0.8833 | 0.9917 | 0.985 | 0.9883 | 0.99 | 0.965 | 0.995 | 0.9715 |
Average | 0.9338 | 0.9908 | 0.8619 | 0.9836 | 0.986 | 0.9779 | 0.9791 | 0.9317 | 0.9924 |
Deep Feature from the Pre-Trained ViT Model | ML Classifier Accuracy | ||||||||
---|---|---|---|---|---|---|---|---|---|
MLP | GaussianNB | Adaboost | KNN | RFClassifier | SVM_linear | SVM_sigmoid | SVM_RBF | Average | |
vit_base_patch16_224 ★ | 0.9750 | 0.8944 | 0.9500 | 0.9105 | 0.9250 | 0.9750 | 0.9339 | 0.9750 | 0.9423 |
vit_base_patch32_224 | 0.9000 | 0.8460 | 0.9000 | 0.9355 | 0.9589 | 0.9016 | 0.9339 | 0.9339 | 0.9137 |
vit_large_patch16_224 | 0.9000 | 0.8460 | 0.8089 | 0.8194 | 0.9000 | 0.9750 | 0.9750 | 0.9750 | 0.8999 |
vit_small_patch32_224 ★ | 0.9500 | 0.9105 | 0.9339 | 0.9105 | 0.9339 | 0.9500 | 0.9589 | 0.9750 | 0.9403 |
deit3_small_patch16_224 | 0.8516 | 0.7032 | 0.8250 | 0.8371 | 0.8500 | 0.9427 | 0.7427 | 0.9016 | 0.8318 |
vit_base_patch8_224 | 0.9500 | 0.8782 | 0.9089 | 0.8855 | 0.9000 | 0.9339 | 0.9339 | 0.9500 | 0.9175 |
vit_tiny_patch16_224 | 0.9589 | 0.8944 | 0.8839 | 0.9032 | 0.9250 | 0.8516 | 0.9427 | 0.9750 | 0.9168 |
vit_small_patch16_224 ★ | 0.9750 | 0.8282 | 0.9750 | 0.9016 | 0.9500 | 0.9339 | 0.9750 | 0.9750 | 0.9392 |
vit_base_patch16_384 | 0.9339 | 0.9032 | 0.9500 | 0.8855 | 0.9500 | 0.9750 | 0.9339 | 0.9589 | 0.9363 |
vit_tiny_patch16_384 | 0.9266 | 0.8032 | 0.8839 | 0.8121 | 0.9250 | 0.8194 | 0.9339 | 0.9589 | 0.8829 |
vit_small_patch32_384 | 0.9427 | 0.9089 | 0.9339 | 0.9427 | 0.9339 | 0.9355 | 0.9339 | 0.9266 | 0.9323 |
vit_small_patch16_384 | 0.9500 | 0.8766 | 0.9500 | 0.8121 | 0.9177 | 0.9177 | 0.9589 | 0.9750 | 0.9198 |
vit_base_patch32_384 | 0.8750 | 0.8137 | 0.9089 | 0.9016 | 0.9250 | 0.9750 | 0.9339 | 0.9750 | 0.9135 |
Average | 0.9299 | 0.8543 | 0.9086 | 0.8813 | 0.9226 | 0.9297 | 0.9300 | 0.9581 |
Deep Feature from the Pre-Trained ViT Model | ML Classifier Precision | ||||||||
---|---|---|---|---|---|---|---|---|---|
XGBoost | MLP | GaussianNB | Adaboost | KNN | RFClassifier | SVM_linear | SVM_sigmoid | SVM_RBF | |
vit_base_patch16_224 | 0.9769 | 0.9900 | 0.8590 | 0.9801 | 0.9899 | 0.9800 | 0.9900 | 0.9550 | 0.9868 |
vit_base_patch32_224 | 0.9932 | 0.9933 | 0.9556 | 0.9966 | 0.9933 | 0.9965 | 0.9868 | 0.9728 | 0.9934 |
vit_large_patch16_224 | 0.9900 | 0.9933 | 0.9623 | 0.9901 | 0.9866 | 0.9866 | 0.9967 | 0.9966 | 1.0000 |
vit_small_patch32_224 | 0.9833 | 0.9834 | 0.9304 | 0.9801 | 0.9966 | 0.9668 | 0.9732 | 0.9226 | 0.9967 |
deit3_small_patch16_224 | 0.9655 | 0.9933 | 0.7788 | 0.9631 | 0.9829 | 0.9562 | 0.9627 | 0.8403 | 0.9933 |
vit_base_patch8_224 | 0.9832 | 0.9967 | 0.9112 | 0.9833 | 0.9899 | 0.9830 | 0.9967 | 0.8908 | 0.9966 |
vit_tiny_patch16_224 | 0.9732 | 0.9868 | 0.8687 | 0.9704 | 0.9834 | 0.9732 | 0.9416 | 0.8911 | 0.9834 |
vit_small_patch16_224 | 0.9770 | 0.9933 | 0.8746 | 0.9900 | 0.9834 | 0.9800 | 0.9831 | 0.9550 | 0.9966 |
vit_base_patch16_384 | 0.9737 | 0.9833 | 0.9187 | 0.9771 | 0.9834 | 0.9799 | 0.9866 | 0.9760 | 0.9866 |
vit_tiny_patch16_384 | 0.9867 | 0.9868 | 0.8201 | 0.9768 | 0.9835 | 0.9770 | 0.9733 | 0.9193 | 0.9966 |
vit_small_patch32_384 | 0.9867 | 0.9900 | 0.8789 | 0.9867 | 0.9933 | 0.9803 | 0.9766 | 0.9381 | 1.0000 |
vit_small_patch16_384 | 0.9834 | 0.9967 | 0.8425 | 0.9898 | 0.9868 | 0.9933 | 0.9766 | 0.9435 | 0.9966 |
vit_base_patch32_384 | 0.9899 | 0.9901 | 0.9197 | 0.9900 | 0.9866 | 0.9835 | 0.9836 | 0.9697 | 0.9934 |
Deep Feature from the Pre-Trained ViT Model | ML Classifier Recall | ||||||||
---|---|---|---|---|---|---|---|---|---|
XGBoost | MLP | GaussianNB | Adaboost | KNN | RFClassifier | SVM_linear | SVM_sigmoid | SVM_RBF | |
vit_base_patch16_224 | 0.9867 | 0.9900 | 0.8733 | 0.9833 | 0.9833 | 0.9800 | 0.9933 | 0.9200 | 0.9933 |
vit_base_patch32_224 | 0.9733 | 0.9933 | 0.7900 | 0.9800 | 0.9833 | 0.9600 | 0.9967 | 0.9533 | 0.9967 |
vit_large_patch16_224 | 0.9933 | 0.9933 | 0.7667 | 0.9967 | 0.9833 | 0.9800 | 0.9967 | 0.9700 | 0.9933 |
vit_small_patch32_224 | 0.9800 | 0.9867 | 0.8467 | 0.9867 | 0.9900 | 0.9700 | 0.9700 | 0.9533 | 0.9933 |
deit3_small_patch16_224 | 0.9333 | 0.9900 | 0.8333 | 0.9567 | 0.9567 | 0.9467 | 0.9467 | 0.8767 | 0.9867 |
vit_base_patch8_224 | 0.9733 | 0.9933 | 0.7867 | 0.9833 | 0.9800 | 0.9633 | 0.9933 | 0.8700 | 0.9900 |
vit_tiny_patch16_224 | 0.9700 | 0.9933 | 0.8600 | 0.9833 | 0.9867 | 0.9700 | 0.9667 | 0.9000 | 0.9900 |
vit_small_patch16_224 | 0.9900 | 0.9900 | 0.8367 | 0.9933 | 0.9867 | 0.9800 | 0.9667 | 0.9200 | 0.9900 |
vit_base_patch16_384 | 0.9867 | 0.9833 | 0.8667 | 0.9967 | 0.9900 | 0.9767 | 0.9833 | 0.9500 | 0.9833 |
vit_tiny_patch16_384 | 0.9900 | 0.9967 | 0.7900 | 0.9833 | 0.9933 | 0.9900 | 0.9733 | 0.8733 | 0.9900 |
vit_small_patch32_384 | 0.9867 | 0.9900 | 0.9433 | 0.9900 | 0.9867 | 0.9933 | 0.9733 | 0.9600 | 0.9933 |
vit_small_patch16_384 | 0.9867 | 0.9967 | 0.8200 | 0.9733 | 0.9933 | 0.9867 | 0.9733 | 0.9467 | 0.9867 |
vit_base_patch32_384 | 0.9800 | 1.0000 | 0.8400 | 0.9933 | 0.9833 | 0.9933 | 0.9967 | 0.9600 | 0.9967 |
Deep Feature from the Pre-Trained ViT Model | ML Classifier F1-Score | ||||||||
---|---|---|---|---|---|---|---|---|---|
XGBoost | MLP | GaussianNB | Adaboost | KNN | RFClassifier | SVM_linear | SVM_sigmoid | SVM_RBF | |
vit_base_patch16_224 | 0.9818 | 0.9900 | 0.8661 | 0.9817 | 0.9866 | 0.9800 | 0.9917 | 0.9372 | 0.9900 |
vit_base_patch32_224 | 0.9832 | 0.9933 | 0.8650 | 0.9882 | 0.9883 | 0.9779 | 0.9917 | 0.9630 | 0.9950 |
vit_large_patch16_224 | 0.9917 | 0.9933 | 0.8534 | 0.9934 | 0.9850 | 0.9833 | 0.9967 | 0.9831 | 0.9967 |
vit_small_patch32_224 | 0.9816 | 0.9850 | 0.8866 | 0.9834 | 0.9933 | 0.9684 | 0.9716 | 0.9377 | 0.9950 |
deit3_small_patch16_224 | 0.9492 | 0.9917 | 0.8052 | 0.9599 | 0.9696 | 0.9514 | 0.9546 | 0.8581 | 0.9900 |
vit_base_patch8_224 | 0.9782 | 0.9950 | 0.8444 | 0.9833 | 0.9849 | 0.9731 | 0.9950 | 0.8803 | 0.9933 |
vit_tiny_patch16_224 | 0.9716 | 0.9900 | 0.8643 | 0.9768 | 0.9850 | 0.9716 | 0.9539 | 0.8955 | 0.9867 |
vit_small_patch16_224 | 0.9834 | 0.9917 | 0.8552 | 0.9917 | 0.9850 | 0.9800 | 0.9748 | 0.9372 | 0.9933 |
vit_base_patch16_384 | 0.9801 | 0.9833 | 0.8919 | 0.9868 | 0.9867 | 0.9783 | 0.9850 | 0.9628 | 0.9850 |
vit_tiny_patch16_384 | 0.9884 | 0.9917 | 0.8048 | 0.9801 | 0.9884 | 0.9834 | 0.9733 | 0.8957 | 0.9933 |
vit_small_patch32_384 | 0.9867 | 0.9900 | 0.9100 | 0.9884 | 0.9900 | 0.9868 | 0.9750 | 0.9489 | 0.9967 |
vit_small_patch16_384 | 0.9850 | 0.9967 | 0.8311 | 0.9815 | 0.9900 | 0.9900 | 0.9750 | 0.9451 | 0.9916 |
vit_base_patch32_384 | 0.9849 | 0.9950 | 0.8780 | 0.9917 | 0.9850 | 0.9884 | 0.9901 | 0.9648 | 0.9950 |
ViT Feature | Adaboost | GaussianNB | KNN | MLP | RFClassifier | SVM_RBF | SVM_linear | SVM_sigmoid |
---|---|---|---|---|---|---|---|---|
deit3_small_patch16_224 | 0.9600 ± 0.0158 | 0.7983 ± 0.0308 | 0.9700 ± 0.0142 | 0.9917 ± 0.0067 | 0.9517 ± 0.0167 | 0.9900 ± 0.0083 | 0.9550 ± 0.0167 | 0.8550 ± 0.0275 |
vit_base_patch16_224 | 0.9817 ± 0.0108 | 0.8650 ± 0.0250 | 0.9867 ± 0.0092 | 0.9900 ± 0.0075 | 0.9800 ± 0.0100 | 0.9900 ± 0.0075 | 0.9917 ± 0.0067 | 0.9383 ± 0.0192 |
vit_base_patch16_384 | 0.9867 ± 0.0092 | 0.8950 ± 0.0250 | 0.9867 ± 0.0092 | 0.9833 ± 0.0108 | 0.9783 ± 0.0117 | 0.9850 ± 0.0092 | 0.9850 ± 0.0100 | 0.9633 ± 0.0158 |
vit_base_patch32_224 | 0.9883 ± 0.0100 | 0.8767 ± 0.0267 | 0.9783 ± 0.0142 | 0.9950 ± 0.0058 | 0.9917 ± 0.0067 | 0.9950 ± 0.0050 | 0.9917 ± 0.0067 | 0.9633 ± 0.0158 |
vit_base_patch32_384 | 0.9917 ± 0.0067 | 0.8833 ± 0.0283 | 0.9850 ± 0.0108 | 0.9933 ± 0.0058 | 0.9883 ± 0.0083 | 0.9950 ± 0.0050 | 0.9900 ± 0.0083 | 0.9650 ± 0.0175 |
vit_large_patch16_224 | 0.9933 ± 0.0067 | 0.8683 ± 0.0250 | 0.9850 ± 0.0125 | 0.9950 ± 0.0058 | 0.9833 ± 0.0125 | 0.9967 ± 0.0042 | 0.9967 ± 0.0042 | 0.9833 ± 0.0158 |
vit_small_patch16_224 | 0.9917 ± 0.0067 | 0.8583 ± 0.0275 | 0.9850 ± 0.0125 | 0.9917 ± 0.0067 | 0.9800 ± 0.0100 | 0.9933 ± 0.0058 | 0.9750 ± 0.0125 | 0.9833 ± 0.0158 |
vit_small_patch16_384 | 0.9817 ± 0.0125 | 0.8333 ± 0.0292 | 0.9900 ± 0.0083 | 0.9883 ± 0.0083 | 0.9900 ± 0.0083 | 0.9917 ± 0.0067 | 0.9750 ± 0.0125 | 0.9483 ± 0.0208 |
vit_small_patch32_224 | 0.9833 ± 0.0117 | 0.8917 ± 0.0292 | 0.9933 ± 0.0067 | 0.9867 ± 0.0092 | 0.9683 ± 0.0167 | 0.9950 ± 0.0050 | 0.9717 ± 0.0142 | 0.9367 ± 0.0217 |
vit_small_patch32_384 | 0.9883 ± 0.0100 | 0.9067 ± 0.0258 | 0.9900 ± 0.0083 | 0.9900 ± 0.0100 | 0.9867 ± 0.0108 | 0.9967 ± 0.0042 | 0.9750 ± 0.0125 | 0.9483 ± 0.0208 |
vit_tiny_patch16_224 | 0.9767 ± 0.0108 | 0.8650 ± 0.0250 | 0.9850 ± 0.0125 | 0.9867 ± 0.0092 | 0.9717 ± 0.0158 | 0.9867 ± 0.0092 | 0.9533 ± 0.0183 | 0.8950 ± 0.0225 |
vit_tiny_patch16_384 | 0.9800 ± 0.0125 | 0.8083 ± 0.0317 | 0.9883 ± 0.0092 | 0.9917 ± 0.0067 | 0.9833 ± 0.0108 | 0.9933 ± 0.0058 | 0.9733 ± 0.0167 | 0.8983 ± 0.0250 |
Deep Feature from the Pre-Trained ViT Model | ML Classifier Accuracy | |||||||
---|---|---|---|---|---|---|---|---|
MLP | GaussianNB | Adaboost | KNN | RFClassifier | SVM_linear | SVM_sigmoid | SVM_RBF | |
vit_base_patch16_224 + vit_small_patch32_224 | 0.9589 | 0.6129 | 0.9250 | 0.9177 | 0.9250 | 0.9750 | 0.9589 | 0.9750 |
vit_base_patch16_224 + vit_small_patch16_224 | 0.9750 | 0.5968 | 0.8266 | 0.8694 | 0.8589 | 0.9750 | 0.9750 | 0.9750 |
vit_small_patch32_224 + vit_small_patch16_224 | 0.9750 | 0.7427 | 0.9177 | 0.8782 | 0.9500 | 0.9750 | 0.9589 | 0.9750 |
vit_base_patch16_224 + vit_small_patch32_224 + vit_small_patch16_224 | 0.9750 | 0.5427 | 0.9589 | 0.9266 | 0.8782 | 0.9750 | 0.9750 | 0.9750 |
Deep Feature from the Pre-Trained ViT Model | ML Classifier Accuracy | |||||||
---|---|---|---|---|---|---|---|---|
MLP | GaussianNB | Adaboost | KNN | RFClassifier | SVM_linear | SVM_sigmoid | SVM_RBF | |
vit_large_patch16_224 + vit_base_patch32_384 | 0.9983 | 0.6700 | 0.9900 | 0.9883 | 0.9900 | 0.9967 | 0.9700 | 0.9967 |
vit_large_patch16_224 + vit_base_patch32_224 | 0.9983 | 0.6550 | 0.9883 | 0.9917 | 0.9717 | 0.9983 | 0.9800 | 0.9967 |
vit_base_patch32_384 + vit_base_patch32_224 | 0.9950 | 0.6983 | 0.9917 | 0.9900 | 0.9750 | 0.9950 | 0.9800 | 0.9967 |
vit_large_patch16_224 + vit_base_patch32_384 + vit_base_patch32_224 | 0.9950 | 0.6683 | 0.9883 | 0.9900 | 0.9833 | 0.9983 | 0.9867 | 0.9967 |
Deep Feature from the Pre-Trained ViT Model | ML Classifier Accuracy | |||||||
---|---|---|---|---|---|---|---|---|
MLP | GaussianNB | Adaboost | KNN | RFClassifier | SVM_linear | SVM_sigmoid | SVM_RBF | |
vit_base_patch16_224 + vit_small_patch32_224 | 0.9339 | 0.6129 | 0.9589 | l | 0.8177 | 0.9750 | 0.9589 | 0.9750 |
vit_base_patch16_224 + vit_small_patch16_224 | 0.9750 | 0.5645 | 0.8516 | 0.8694 | 0.7750 | 0.9750 | 0.9750 | 0.9750 |
vit_small_patch32_224 + vit_small_patch16_224 | 0.9500 | 0.7105 | 0.9177 | 0.8782 | 0.9339 | 0.9750 | 0.9589 | 0.9750 |
vit_base_patch16_224 + vit_small_patch32_224 + vit_small_patch16_224 | 0.9750 | 0.5427 | 0.9589 | 0.9266 | 0.8782 | 0.9750 | 0.9750 | 0.9750 |
Deep Feature from the Pre-Trained ViT Model | ML Classifier Accuracy | |||||||
---|---|---|---|---|---|---|---|---|
MLP | GaussianNB | Adaboost | KNN | RFClassifier | SVM_linear | SVM_sigmoid | SVM_RBF | |
vit_large_patch16_224 + vit_base_patch32_384 | 0.9967 | 0.6667 | 0.9917 | 0.9900 | 0.9883 | 0.9967 | 0.9717 | 0.9967 |
vit_large_patch16_224 + vit_base_patch32_224 | 0.9983 | 0.6550 | 0.9850 | 0.9917 | 0.9867 | 0.9983 | 0.9783 | 0.9967 |
vit_base_patch32_384 + vit_base_patch32_224 | 0.9983 | 0.6983 | 0.9933 | 0.9900 | 0.9833 | 0.9933 | 0.9817 | 0.9967 |
vit_large_patch16_224 + vit_base_patch32_384 + vit_base_patch32_224 | 0.9933 | 0.6600 | 0.9900 | 0.9900 | 0.9817 | 0.9983 | 0.9867 | 0.9967 |
Deep Feature from the Pre-Trained ViT Model | ML Classifier Accuracy | |||||||
---|---|---|---|---|---|---|---|---|
MLP | GaussianNB | Adaboost | KNN | RFClassifier | SVM_linear | SVM_sigmoid | SVM_RBF | |
vit_base_patch16_224 + vit_small_patch32_224 | 0.9750 | 0.6129 | 0.9089 | 0.9177 | 0.9500 | 0.9750 | 0.9589 | 0.9750 |
vit_base_patch16_224 + vit_small_patch16_224 | 0.9750 | 0.5645 | 0.8427 | 0.8694 | 0.8000 | 0.9750 | 0.9750 | 0.9750 |
vit_small_patch32_224 + vit_small_patch16_224 | 0.9750 | 0.7427 | 0.8927 | 0.8782 | 0.8839 | 0.9750 | 0.9589 | 0.9750 |
vit_base_patch16_224 + vit_small_patch32_224 + vit_small_patch16_224 | 0.9750 | 0.5427 | 0.9589 | 0.9266 | 0.8782 | 0.9750 | 0.9750 | 0.9750 |
Deep Feature from the Pre-Trained ViT Model | ML Classifier Accuracy | |||||||
---|---|---|---|---|---|---|---|---|
MLP | GaussianNB | Adaboost | KNN | RFClassifier | SVM_linear | SVM_sigmoid | SVM_RBF | |
vit_large_patch16_224 + vit_base_patch32_384 | 0.9967 | 0.6717 | 0.9883 | 0.9883 | 0.9883 | 0.9967 | 0.9750 | 0.9967 |
vit_large_patch16_224 + vit_base_patch32_224 | 0.9983 | 0.6483 | 0.9850 | 0.9917 | 0.9833 | 0.9983 | 0.9783 | 0.9967 |
vit_base_patch32_384 + vit_base_patch32_224 | 0.9967 | 0.6933 | 0.9900 | 0.9900 | 0.9767 | 0.9950 | 0.9817 | 0.9967 |
vit_large_patch16_224 + vit_base_patch32_384 + vit_base_patch32_224 | 0.9983 | 0.6633 | 0.9900 | 0.9900 | 0.9867 | 0.9983 | 0.9833 | 0.9967 |
Deep Feature from the Pre-Trained ViT Model | ML Classifier Accuracy | |||||||
---|---|---|---|---|---|---|---|---|
MLP | GaussianNB | Adaboost | KNN | RFClassifier | SVM_linear | SVM_sigmoid | SVM_RBF | |
vit_base_patch16_224 + vit_small_patch32_224 | 0.9750 | 0.6129 | 0.9250 | 0.9016 | 0.8750 | 0.9750 | 0.9589 | 0.9750 |
vit_base_patch16_224 + vit_small_patch16_224 | 0.9500 | 0.5806 | 0.8589 | 0.8694 | 0.8000 | 0.9750 | 0.9750 | 0.9750 |
vit_small_patch32_224 + vit_small_patch16_224 | 0.9339 | 0.7927 | 0.9000 | 0.8782 | 0.8839 | 0.9750 | 0.9589 | 0.9750 |
vit_base_patch16_224 + vit_small_patch32_224 + vit_small_patch16_224 | 0.9589 | 0.5427 | 0.9589 | 0.9266 | 0.8782 | 0.9750 | 0.9750 | 0.9750 |
Deep Feature from the Pre-Trained ViT Model | ML Classifier Accuracy | |||||||
---|---|---|---|---|---|---|---|---|
MLP | GaussianNB | Adaboost | KNN | RFClassifier | SVM_linear | SVM_sigmoid | SVM_RBF | |
vit_large_patch16_224 + vit_base_patch32_384 | 0.9967 | 0.6750 | 0.9883 | 0.9883 | 0.9817 | 0.9967 | 0.9750 | 0.9967 |
vit_large_patch16_224 + vit_base_patch32_224 | 0.9967 | 0.6550 | 0.9817 | 0.9917 | 0.9817 | 0.9983 | 0.9817 | 0.9967 |
vit_base_patch32_384 + vit_base_patch32_224 | 0.9967 | 0.6933 | 0.9850 | 0.9900 | 0.9817 | 0.9950 | 0.9817 | 0.9967 |
vit_large_patch16_224 + vit_base_patch32_384 + vit_base_patch32_224 | 0.9917 | 0.6700 | 0.9900 | 0.9900 | 0.9817 | 1.0000 | 0.9867 | 0.9967 |
ML Ensembling | Pre-Trained DL Models Ensembling | ||||
---|---|---|---|---|---|
vit_base_patch16_224 | vit_small_patch32_224 | vit_small_patch16_224 | vit_base_patch16_384 | vit_small_patch32_384 | |
MLP + SVM_sigmoid | 0.9800 | 0.9483 | 0.9550 | 0.9783 | 0.9417 |
MLP + SVM_RBF | 0.9883 | 0.9883 | 0.9950 | 0.9850 | 0.9950 |
SVM_sigmoid + SVM_RBF | 0.9783 | 0.9533 | 0.9550 | 0.9767 | 0.9433 |
MLP + SVM_sigmoid + SVM_RBF | 0.9883 | 0.9883 | 0.9900 | 0.9883 | 0.9900 |
ML Ensembling | Pre-Trained DL Models Ensembling | ||||
---|---|---|---|---|---|
vit_large_patch16_224 | vit_base_patch32_384 | vit_base_patch32_224 | vit_small_patch32_384 | vit_base_patch16_384 | |
KNN + MLP | 0.9900 | 0.9917 | 0.9917 | 0.9900 | 0.9833 |
KNN + SVM_RBF | 0.9917 | 0.9933 | 0.9933 | 0.9950 | 0.9850 |
MLP + SVM_RBF | 0.9950 | 0.9900 | 0.9950 | 0.9900 | 0.9883 |
KNN + MLP + SVM_RBF | 0.9950 | 0.9917 | 0.9950 | 0.9950 | 0.9900 |
ML Ensembling | Pre-Trained DL Models Ensembling | ||||
---|---|---|---|---|---|
vit_base_patch16_224 | vit_small_patch32_224 | vit_small_patch16_224 | vit_base_patch16_384 | vit_small_patch32_384 | |
MLP + SVM_sigmoid | 0.9817 | 0.9467 | 0.9617 | 0.9767 | 0.9367 |
MLP + SVM_RBF | 0.9883 | 0.995 | 0.9933 | 0.9883 | 0.9917 |
SVM_sigmoid + SVM_RBF | 0.9817 | 0.9517 | 0.9617 | 0.9767 | 0.94 |
MLP + SVM_sigmoid + SVM_RBF | 0.99 | 0.9917 | 0.995 | 0.9883 | 0.995 |
ML Ensembling | Pre-Trained DL Models Ensembling | ||||
---|---|---|---|---|---|
vit_large_patch16_224 | vit_base_patch32_384 | vit_base_patch32_224 | vit_small_patch32_384 | vit_base_patch16_384 | |
KNN + MLP | 0.9917 | 0.9950 | 0.9917 | 0.9883 | 0.9850 |
KNN + SVM_RBF | 0.9917 | 0.9933 | 0.9933 | 0.9950 | 0.9850 |
MLP + SVM_RBF | 0.9950 | 0.9917 | 0.9950 | 0.9967 | 0.9900 |
KNN + MLP + SVM_RBF | 0.9967 | 0.9933 | 0.9933 | 0.9950 | 0.9900 |
ML Ensembling | Pre-Trained DL Models Ensembling | ||||
---|---|---|---|---|---|
vit_base_patch16_224 | vit_small_patch32_224 | vit_small_patch16_224 | vit_base_patch16_384 | vit_small_patch32_384 | |
MLP + SVM_sigmoid | 0.9817 | 0.945 | 0.955 | 0.9733 | 0.94 |
MLP + SVM_RBF | 0.9883 | 0.9917 | 0.9933 | 0.99 | 0.9883 |
SVM_sigmoid + SVM_RBF | 0.9817 | 0.95 | 0.9567 | 0.975 | 0.9467 |
MLP + SVM_sigmoid + SVM_RBF | 0.9883 | 0.99 | 0.9917 | 0.9883 | 0.9917 |
ML Ensembling | Pre-Trained DL Models Ensembling | ||||
---|---|---|---|---|---|
vit_large_patch16_224 | vit_base_patch32_384 | vit_base_patch32_224 | vit_small_patch32_384 | vit_base_patch16_384 | |
KNN + MLP | 0.9900 | 0.9950 | 0.9950 | 0.9883 | 0.9883 |
KNN + SVM_RBF | 0.9900 | 0.9933 | 0.9933 | 0.9950 | 0.9850 |
MLP + SVM_RBF | 0.9933 | 0.9950 | 0.9933 | 0.9950 | 0.9900 |
KNN + MLP + SVM_RBF | 0.9967 | 0.9917 | 0.9950 | 0.9950 | 0.9883 |
ML Ensembling | Pre-Trained DL Models Ensembling | ||||
---|---|---|---|---|---|
vit_base_patch16_224 | vit_small_patch32_224 | vit_small_patch16_224 | vit_base_patch16_384 | vit_small_patch32_384 | |
MLP + SVM_sigmoid | 0.9817 | 0.9517 | 0.9550 | 0.9733 | 0.9417 |
MLP + SVM_RBF | 0.9883 | 0.9950 | 0.9950 | 0.9883 | 0.9933 |
SVM_sigmoid + SVM_RBF | 0.9817 | 0.9517 | 0.9567 | 0.9750 | 0.9433 |
MLP + SVM_sigmoid + SVM_RBF | 0.9900 | 0.9917 | 0.9950 | 0.9867 | 0.9900 |
ML Ensembling | Pre-Trained DL Models Ensembling | ||||
---|---|---|---|---|---|
vit_large_patch16_224 | vit_base_patch32_384 | vit_base_patch32_224 | vit_small_patch32_384 | vit_base_patch16_384 | |
KNN + MLP | 0.9917 | 0.9950 | 0.9967 | 0.9917 | 0.9883 |
KNN + SVM_RBF | 0.9917 | 0.9933 | 0.9933 | 0.9950 | 0.9850 |
MLP + SVM_RBF | 0.9950 | 0.9917 | 0.9933 | 0.9950 | 0.9883 |
KNN + MLP + SVM_RBF | 0.9967 | 0.9933 | 0.9967 | 0.9950 | 0.9883 |
Stage | Average Time per Scan | GPU Memory Usage | Notes |
---|---|---|---|
Training with grid search | ∼18–22 h (entire dataset) | ∼22 GB | One-time process; hyperparameter optimization only |
Inference (single scan) | 0.45–0.60 s | ∼3.5 GB | Grid search not required; real-time feasible |
Inference (batch of 16) | 6–8 s | ∼5.2 GB | Scales linearly with batch size |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ullah, Z.; Kim, J. Hierarchical Deep Feature Fusion and Ensemble Learning for Enhanced Brain Tumor MRI Classification. Mathematics 2025, 13, 2787. https://doi.org/10.3390/math13172787
Ullah Z, Kim J. Hierarchical Deep Feature Fusion and Ensemble Learning for Enhanced Brain Tumor MRI Classification. Mathematics. 2025; 13(17):2787. https://doi.org/10.3390/math13172787
Chicago/Turabian StyleUllah, Zahid, and Jihie Kim. 2025. "Hierarchical Deep Feature Fusion and Ensemble Learning for Enhanced Brain Tumor MRI Classification" Mathematics 13, no. 17: 2787. https://doi.org/10.3390/math13172787
APA StyleUllah, Z., & Kim, J. (2025). Hierarchical Deep Feature Fusion and Ensemble Learning for Enhanced Brain Tumor MRI Classification. Mathematics, 13(17), 2787. https://doi.org/10.3390/math13172787