Hybrid Multi-Label Classification Model for Medical Applications Based on Adaptive Synthetic Data and Ensemble Learning
Abstract
:1. Introduction
Organization of This Research Work
2. Literature Review
2.1. Contribution to the Work
- (i)
- Adaptive synthetic sampling-based synthetic data generation is performed to balance the input data class label;
- (ii)
- Data normalization is used to normalize the input data scale based on min-max normalization, which increases the accuracy of the proposed classification model;
- (iii)
- Significant features are selected based on avoiding redundant and noisy data from the samples using velocity-equalized particle swarm optimization. Standard particle swarm optimization has the issue of early convergence, and it has been strengthened by balancing the velocity (v) with each aspect of the issue in the proposed velocity equalized particle swarm optimization model;
- (iv)
- Multi-label classification using an ensemble of Adaptive Neuro-Fuzzy Inference Systems, Probabilistic Neural Networks, and Clustering-Based Decision Tree methods processed based on averaging method.
2.2. Motivation
3. Proposed Methodology
3.1. Data Balancing Using ADAptive SYNthetic (ADASYN) Sampling
Algorithm 1: ADASYN |
Step 1: Start. Step 2: Input Training data set having m samples where represents a particular occurrence of the features X and in the n-dimensional feature space is assigned the class identity label. According to this definition, the percentage of minority class cases and majority class examples is represented by the letters ms and ml correspondingly. Hence, Equation (1) computes the degree of class imbalance: Step 3: If then is the current cut-off point for the proportion of class unbalance that may be accepted.
Choose one minority, then choose a random data value, xzi, the K closest neighbors for data xi. Create the synthetic data, as in Equation (5), for instance: End Loop Step 4. End. |
3.2. Pre-Processing Using Min—Max Normalization
Algorithm 2: Min-Max normalization |
Input: Pima Indian, Yeast 1, and New—thyroid 1 datasets Output: Normalized values for all datasets Step 1: Start Take the maximum value from the array. Step 2: Take the minimum value from the array. Step 3: Estimate and show the average value from the array and the number of values that are larger than the average. Step 4: Estimate and show the normalized values of the original array values using Equation (6). Step 5: End. |
3.3. Feature Selection Using Velocity Equalized Particle Swarm Optimization
3.3.1. Particle Swarm Optimization
3.3.2. Velocity Equalized Particle Swarm Optimization
Initial Population
Solution Representation
Fitness Function
Position Updation
Algorithm 3: Velocity equalized particle swarm optimization |
Step 1: Start. Step 2: Swarm (job initialization) randomly initialize the position and velocity of each particle. Step 3: Particle (feature) fitness (classification accuracy) evaluation if the fitness of xi > pbesti pbesti = xi if the fitness of pbesti > gbesti gbesti = pbesti Step 4: Update the velocity of particle (feature) i Step 6: Return gbest and its fitness values (classification accuracy). Step 7: End. |
3.4. Multi-Label Classification Using Ensemble Classification
3.4.1. Adaptive Neuro-Fuzzy Inference System (ANFIS)
3.4.2. Probabilistic Neural Network
3.4.3. Clustering-Based Decision Tree
3.4.4. Ensembling
4. Results and Discussion
4.1. Experimental Setup and Comparative Analysis
4.2. Performance Metrics
- (1)
- Precision
- (2)
- Recall
- (3)
- Accuracy
- (4)
- F measure
4.3. Comparison of Proposed and Existing Models
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Tarekegn, A.N.; Giacobini, M.; Michalak, K. A review of methods for imbalanced multi-label classification. Pattern Recognit. 2021, 118, 107965. [Google Scholar] [CrossRef]
- Schmitt, V.H.; Hobohm, L.; Münzel, T.; Wenzel, P.; Gori, T.; Keller, K. Impact of diabetes mellitus on mortality rates and outcomes in myocardial infarction. Diabetes Metab. 2021, 47, 101211. [Google Scholar] [CrossRef] [PubMed]
- Manickum, P.; Mashamba-Thompson, T.; Naidoo, R.; Ramklass, S.; Madiba, T. Knowledge and practice of diabetic foot care–A scoping review. Diabetes Metab. Syndr. Clin. Res. Rev. 2021, 15, 783–793. [Google Scholar] [CrossRef]
- Mishra, N.K.; Singh, P.K. Linear ordering problem-based classifier chain using genetic algorithm for multi-label classification. Appl. Soft Comput. 2022, 117, 108395. [Google Scholar] [CrossRef]
- Zhao, D.; Wang, Q.; Zhang, J.; Bai, C. Mine Diversified Contents of Multi-Spectral Cloud Images Along With Geographical Information for Multi-label Classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–15. [Google Scholar]
- Liu, Z.; Tang, C.; Abhadiomhen, S.E.; Shen, X.J.; Li, Y. Robust Label and Feature Space Co-Learning for Multi-label Classification. IEEE Trans. Knowl. Data Eng. 2023, 1–14. [Google Scholar] [CrossRef]
- Singh, K.; Sharma, B.; Singh, J.; Srivastava, G.; Sharma, S.; Aggarwal, A.; Cheng, X. Local statistics-based speckle reducing bilateral filter for medical ultrasound images. Mob. Netw. Appl. 2020, 25, 2367–2389. [Google Scholar] [CrossRef]
- Huang, J.; Qian, W.; Vong, C.M.; Ding, W.; Shu, W.; Huang, Q. Multi-label Feature Selection via Label Enhancement and Analytic Hierarchy Process. IEEE Trans. Emerg. Top. Comput. Intell. 2023. [Google Scholar] [CrossRef]
- Koundal, D.; Sharma, B.; Guo, Y. Intuitionistic based segmentation of thyroid nodules in ultrasound images. Comput. Biol. Med. 2020, 121, 103776. [Google Scholar] [CrossRef]
- Mikolov, T.; Karafi’at, M.; Burget, L.; Cernock’y, J.; Khudanpur, S. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association, Interspeech, Makuhari, Chiba, Japan, 26–30 September 2010; pp. 1045–1048. [Google Scholar]
- Lai, S.; Xu, L.; Liu, K.; Zhao, J. Recurrent convolutional neural networks for text classification. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015. [Google Scholar]
- Lin, Z.; Ding, G.; Han, J.; Shao, L. End-to-end feature-aware label space encoding for multi-label classification with many classes. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 2472–2487. [Google Scholar] [CrossRef]
- Wang, X.; Sun, L.; Wei, Z. An Improved Convolutional Neural Network Algorithm for Multi-label Classification. In Proceedings of the 2018 International Conference on Audio, Language and Image Processing (ICALIP), Shanghai, China, 16–17 July 2018; IEEE: New York, NY, USA, 2018; pp. 113–117. [Google Scholar]
- Yan, Y.; Wang, Y.; Gao, W.C.; Zhang, B.W.; Yang, C.; Yin, X.C. LSTM: Multi-label Ranking for Document Classification. Neural Process. Lett. 2018, 47, 117–138. [Google Scholar] [CrossRef]
- Jindal, R. A Novel Method for Efficient Multi-label Text Categorization of research articles. In International Conference on Computing, Power and Communication Technologies (GUCON), Greater Noida, India, 28–29 September 2018; IEEE: New York, NY, USA, 2018; pp. 333–336. [Google Scholar]
- Charte, F.; Rivera, A.J.; del Jesus, M.J.; Herrera, F. REMEDIAL-HwR: Tackling multi-label imbalance through label decoupling and data resampling hybridization. Neurocomputing 2019, 326, 110–122. [Google Scholar] [CrossRef] [Green Version]
- Alyousef, A.A.; Nihtyanova, S.; Denton, C.P.; Bosoni, P.; Bellazzi, R.; Tucker, A. Latent Class Multi-label Classification to Identify Subclasses of Disease for Improved Prediction. In Proceedings of the IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS), Cordoba, Spain, 5–7 June 2019; IEEE: New York, NY, USA, 2019; pp. 535–538. [Google Scholar]
- Wang, R.; Kwong, S.; Wang, X.; Jia, Y. Active k-label sets ensemble for multi-label classification. Pattern Recognit. 2021, 109, 107583. [Google Scholar] [CrossRef]
- Che, X.; Chen, D.; Mi, J. Feature distribution-based label correlation in multi-label classification. Int. J. Mach. Learn. Cybern. 2021, 12, 1705–1719. [Google Scholar] [CrossRef]
- Sun, L.; Wang, T.; Ding, W.; Xu, J.; Lin, Y. Feature selection using Fisher score and multi-label neighborhood rough sets for multi-label classification. Inf. Sci. 2021, 578, 887–912. [Google Scholar] [CrossRef]
- Huang, J.; Vong, C.M.; Chen, C.P.; Zhou, Y. Accurate and Efficient Large-Scale Multi-label Learning With Reduced Feature Broad Learning System Using Label Correlation. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–14. [Google Scholar] [CrossRef]
- Bayati, H.; Dowlatshahi, M.B.; Hashemi, A. MSSL: A memetic-based sparse subspace learning algorithm for multi-label classification. Int. J. Mach. Learn. Cybern. 2022, 13, 3607–3624. [Google Scholar] [CrossRef]
- Zhu, X.; Li, J.; Ren, J.; Wang, J.; Wang, G. Dynamic ensemble learning for multi-label classification. Inf. Sci. 2023, 623, 94–111. [Google Scholar] [CrossRef]
- Zhang, J.; Liu, K.; Yang, X.; Ju, H.; Xu, S. Multi-label learning with Relief-based label-specific feature selection. Appl. Intell. 2023, 53, 18517–18530. [Google Scholar] [CrossRef]
- Ghane, S.; Bhorade, N.; Chitre, N.; Poyekar, B.; Mote, R.; Topale, P. Diabetes Prediction using Feature Extraction and Machine Learning Models. In Proceedings of the Second International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, India, 4–6 August 2021; IEEE: New York, NY, USA, 2021; pp. 1652–1657. [Google Scholar]
- Idarraga, A.J.; Luong, G.; Hsiao, V.; Schneider, D.F. False negative rates in benign thyroid nodule diagnosis: Machine learning for detecting malignancy. J. Surg. Res. 2021, 268, 562–569. [Google Scholar] [CrossRef]
- Prabha, A.; Yadav, J.; Rani, A.; Singh, V. Design of intelligent diabetes mellitus detection system using hybrid feature selection based XGBoost classifier. Comput. Biol. Med. 2021, 136, 104664. [Google Scholar] [CrossRef]
- Kumari, S.; Kumar, D.; Mittal, M. An ensemble approach for classification and prediction of diabetes mellitus using the soft voting classifier. Int. J. Cogn. Comput. Eng. 2021, 2, 40–46. [Google Scholar] [CrossRef]
- Joseph, L.P.; Joseph, E.A.; Prasad, R. Explainable diabetes classification using hybrid Bayesian-optimized TabNet architecture. Comput. Biol. Med. 2022, 151, 106178. [Google Scholar] [CrossRef]
- Zhao, T.; Chen, C.; Cao, H. An evolutionary self-organizing fuzzy system using fuzzy-classification-based social learning particle swarm optimization. Inf. Sci. 2022, 606, 92–111. [Google Scholar] [CrossRef]
- Zhang, L.; Lim, C.P.; Yu, Y.; Jiang, M. Sound classification using evolving ensemble models and Particle Swarm Optimization. Appl. Soft Comput. 2022, 116, 108322. [Google Scholar] [CrossRef]
- Dhiman, P.; Kukreja, V.; Manoharan, P.; Kaur, A.; Kamruzzaman, M.M.; Dhaou, I.B.; Iwendi, C. A novel deep learning model for detection of severity level of the disease in citrus fruits. Electronics 2022, 11, 495. [Google Scholar] [CrossRef]
- Kukreja, V.; Dhiman, P. A Deep Neural Network based disease detection scheme for Citrus fruits. In Proceedings of the 2020 International Conference on Smart Electronics and Communication (ICOSEC), Trichy, India, 10–12 September 2020; pp. 97–101. [Google Scholar]
- Rani, S.; Malu, G.; Sherly, E. Kidney Stone Detection from CT images using Probabilistic Neural Network (PNN) and Watershed Algorithm. In Proceedings of the International Conference on Advances in Intelligent Computing and Applications (AICAPS), Kerala, India, 1–3 February 2023; IEEE: New York, NY, USA, 2023; pp. 1–6. [Google Scholar]
- Nagi, R.; Tripathy, S.S. Plant disease identification using fuzzy feature extraction and PNN. Signal Image Video Process. 2023, 17, 2809–2815. [Google Scholar] [CrossRef]
- Thakur, A.K.; Mukherjee, A.; Kundu, P.K.; Das, A. Classification and Authentication of Induction Motor Faults using Time and Frequency Feature Dependent Probabilistic Neural Network Model. J. Inst. Eng. Ser. B 2023, 104, 623–640. [Google Scholar] [CrossRef]
- Tufail, A.B.; Ma, Y.K.; Zhang, Q.N.; Khan, A.; Zhao, L.; Yang, Q.; Adeel, M.; Khan, R.; Ullah, I. 3D convolutional neural networks-based multi-class classification of Alzheimer’s and Parkinson’s diseases using PET and SPECT neuroimaging modalities. Brain Inform. 2021, 8, 1–9. [Google Scholar] [CrossRef]
- Suthar, V.; Vakharia, V.; Patel, V.K.; Shah, M. Detection of compound faults in ball bearings using multiscale-SinGAN, heat transfer search optimization, and extreme learning machine. Machines 2022, 11, 29. [Google Scholar] [CrossRef]
- Mahesh, T.R.; Dhilip Kumar, V.; Vinoth Kumar, V.; Asghar, J.; Geman, O.; Arulkumaran, G.; Arun, N. AdaBoost ensemble methods using K-fold cross validation for survivability with the early detection of heart disease. Comput. Intell. Neurosci. 2022, 2022, 9005278. [Google Scholar] [CrossRef] [PubMed]
Author | Approaches | Discussion | Demerits |
---|---|---|---|
Wang [13] et al. (2018) | Multi-label Classification Utilizing an Improved Convolutional Neural Network Algorithm | It achieves an average labeling accuracy above 93% | Quite difficult the classification of images in various positions |
Jindal [15] (2018) | A novel method for multi-label text document categorization that is both automatic and effective | Yields reasonable performance achieving an accuracy of 75%. | However, word embedding does not discriminate against different senses. |
Charte [16] et al. (2019) | Tackling Multi-label Imbalance using Label Decoupling and Data Resampling Hybridization | Hamming Loss and Ranking Loss are minimized. | Performance degradation in using the high dataset |
Alyousef [17] et al. (2019) | Identification of Illness Subclasses Using Latent Class Multi-label Classification for Better Prediction | Results show that the “Latent Class Multi-label Classification Model” increases the accuracy in comparison with contemporary potential techniques. | The primary disadvantage is that it is undesirable for datasets having a huge number of labels, owing to the massive exploration space |
Wang et al. [18] (2020) | Active k-label sets ensemble | Feasible and effective | How to further improve the training efficiency will be an important issue |
Che et al. [19] (2021) | FL-MLC | Is effective and diverse for multi-label classification | Increases the time complexity |
Sun et al. [20] (2021) | Margin-based MNRS model | Effective and feasible | Increases the false positive rate |
Huang et al. [21] (2022) | Correlation-based label thresholding | Produces better performance | Does not evaluated on high volume data |
Bayati et al. [22] (2022) | Subspace learning and memetic algorithm | Superior to comparing methods | Increases the false positive rate |
Zhu et al. [23] (2023) | Dynamic Ensemble learning | Outperforms the state-of-the-art methods. | Time-consuming nature |
Zhang et al. [24] (2023) | Relief-LIFT | Achieve better performance | This does not apply to all applications |
Name of the Data Set | Attributes (Real/ Integer/Nominal) | Example | Imbalance Ratio |
---|---|---|---|
Pima Indians Data Set | 8 (8/0/0) | 768 | 1.87 |
Yeast 1 Data Set | 8 (8/0/0) | 1484 | 2.46 |
New—thyroid 1 Data Set | 5 (4/1/0) | 215 | 5.14 |
SRL Num | Software Component | Component Description |
---|---|---|
1. | Coding Language | MATLAB 2013a |
SRL Num | Hardware Component | Component Description |
1. | System | Intel Core Processor |
2. | Hard Disk | 40 GB |
3. | Floppy Drive | 44 Mb |
4. | Monitor | 15 VGA Colour |
5. | Ram | 512 Mb |
Metrics | Methods | |||||
---|---|---|---|---|---|---|
PCT | Homer | ML-F | BARF-MLC | FSEA-MLC | ASD-MLC | |
Accuracy | 69.53 | 72.26 | 83.07 | 87.23 | 88.67 | 90.23 |
Precision | 67.98 | 70.34 | 76.23 | 80.59 | 90.56 | 92.10 |
Recall | 69.32 | 71.23 | 83.93 | 85.71 | 94.32 | 96.73 |
F measure | 68.56 | 73.15 | 79.90 | 83.07 | 92.41 | 94.36 |
Metrics | Methods | |||||
---|---|---|---|---|---|---|
PCT | Homer | ML-F | BARF-MLC | FSEA-MLC | ASD-MLC | |
Accuracy | 65.56 | 70.65 | 82.29 | 86.71 | 88.67 | 90.88 |
Precision | 62.54 | 69.97 | 80.44 | 85.09 | 90.56 | 92.13 |
Recall | 65.65 | 71.35 | 81.89 | 86.77 | 94.32 | 96.95 |
F measure | 68.89 | 73.78 | 81.16 | 85.92 | 92.41 | 94.48 |
Metrics | Methods | |||||
---|---|---|---|---|---|---|
PCT | Homer | ML-F | BARF-MLC | FSEA-MLC | ASD-MLC | |
Accuracy | 67.44 | 71.56 | 83.55 | 86.42 | 88.12 | 90.07 |
Precision | 68.56 | 70.56 | 73.94 | 76.86 | 90.54 | 92.09 |
Recall | 66.56 | 72.54 | 84.36 | 86.09 | 94.05 | 96.68 |
F measure | 67.43 | 72.54 | 78.81 | 81.21 | 92.26 | 94.33 |
Dataset | Metrics | Methods | |
---|---|---|---|
ASDMLC | ASDMLC + VPSO | ||
Thyroid | Accuracy (%) | 89.12 | 90.07 |
Precision (%) | 91.23 | 92.09 | |
Recall (%) | 94.67 | 96.68 | |
F-measure (%) | 93 | 94.33 | |
Pima | Accuracy (%) | 88.90 | 90.88 |
Precision (%) | 91 | 92.13 | |
Recall (%) | 94.89 | 96.95 | |
F-measure (%) | 93.12 | 94.48 | |
Yeast | Accuracy (%) | 88.98 | 90.23 |
Precision (%) | 90.45 | 92.10 | |
Recall (%) | 96.73 | 94.98 | |
F-measure (%) | 93.10 | 94.36 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Priyadharshini, M.; Banu, A.F.; Sharma, B.; Chowdhury, S.; Rabie, K.; Shongwe, T. Hybrid Multi-Label Classification Model for Medical Applications Based on Adaptive Synthetic Data and Ensemble Learning. Sensors 2023, 23, 6836. https://doi.org/10.3390/s23156836
Priyadharshini M, Banu AF, Sharma B, Chowdhury S, Rabie K, Shongwe T. Hybrid Multi-Label Classification Model for Medical Applications Based on Adaptive Synthetic Data and Ensemble Learning. Sensors. 2023; 23(15):6836. https://doi.org/10.3390/s23156836
Chicago/Turabian StylePriyadharshini, M., A. Faritha Banu, Bhisham Sharma, Subrata Chowdhury, Khaled Rabie, and Thokozani Shongwe. 2023. "Hybrid Multi-Label Classification Model for Medical Applications Based on Adaptive Synthetic Data and Ensemble Learning" Sensors 23, no. 15: 6836. https://doi.org/10.3390/s23156836
APA StylePriyadharshini, M., Banu, A. F., Sharma, B., Chowdhury, S., Rabie, K., & Shongwe, T. (2023). Hybrid Multi-Label Classification Model for Medical Applications Based on Adaptive Synthetic Data and Ensemble Learning. Sensors, 23(15), 6836. https://doi.org/10.3390/s23156836