Enhanced Feature Selection via Hierarchical Concept Modeling
Abstract
:1. Introduction
2. Related Works
3. The Proposed Models for Feature Selection
Algorithm 1: Feature selection for FCA and decision tree. |
Input : The training dataset comprises all attributes and n instances. : Tree in decision tree or concept lattice including k level. : and z // for example: Output : Set of selected attributes (). Method : 1. , 2. 3. // Accuracy of using selected attributes in level l 4. For to k // l=1 to k 5. 6. 7. 8. If () 9. 10. 11. Else Break(); 12. End For 13. Return () |
4. Research Methodology
4.1. Dataset Description and Tools
4.2. The Experimental Design
5. Results and Discussion
5.1. The Original Performance Results
5.2. The Hierarchical Concept Model Performance
5.2.1. Feature Selection Using Decision Tree
5.2.2. Feature Selection Using FCA
5.2.3. An Example of Feature Selection Optimization
5.3. The Performances of Prior Feature Selection Approaches
5.4. Comparison of Performances
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Dhal, P.; Azad, C. A comprehensive survey on feature selection in the various fields of machine learning. Appl. Intell. 2022, 52, 4543–4581. [Google Scholar] [CrossRef]
- Khaire, U.M.; Dhanalakshmi, R. Stability of feature selection algorithm: A review. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 1060–1073. [Google Scholar] [CrossRef]
- Solorio-Fernández, S.; Carrasco-Ochoa, J.A.; Martínez-Trinidad, J.F. A new hybrid filter–wrapper feature selection method for clustering based on ranking. Neurocomputing 2016, 214, 866–880. [Google Scholar] [CrossRef]
- Cai, J.; Luo, J.; Wang, S.; Yang, S. Feature selection in machine learning: A new perspective. Neurocomputing 2018, 300, 70–79. [Google Scholar] [CrossRef]
- Zhao, H.; Wang, P.; Hu, Q.; Zhu, P. Fuzzy Rough Set Based Feature Selection for Large-Scale Hierarchical Classification. IEEE Trans. Fuzzy Syst. 2019, 27, 1891–1903. [Google Scholar] [CrossRef]
- Bolón-Canedo, V.; Alonso-Betanzos, A. Ensembles for feature selection: A review and future trends. Inf. Fusion 2019, 52, 1–12. [Google Scholar] [CrossRef]
- Zebari, R.; Abdulazeez, A.; Zeebaree, D.; Zebari, D.; Saeed, J. A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction. J. Appl. Sci. Technol. Trends 2020, 1, 56–70. [Google Scholar] [CrossRef]
- Wan, C.; Freitas, A.A. An empirical evaluation of hierarchical feature selection methods for classification in bioinformatics datasets with gene ontology-based features. Artif. Intell. Rev. 2018, 50, 201–240. [Google Scholar] [CrossRef]
- Wetchapram, P.; Muangprathub, J.; Choopradit, B.; Wanichsombat, A. Feature Selection Based on Hierarchical Concept Model Using Formal Concept Analysis. In Proceedings of the 2021 18th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), Chiang Mai, Thailand, 19–22 May 2021; pp. 299–302. [Google Scholar] [CrossRef]
- Hancer, E.; Xue, B.; Zhang, M. A survey on feature selection approaches for clustering. Artif. Intell. Rev. 2020, 53, 4519–4545. [Google Scholar] [CrossRef]
- Cerrada, M.; Sánchez, R.V.; Pacheco, F.; Cabrera, D.; Zurita, G.; Li, C. Hierarchical feature selection based on relative dependency for gear fault diagnosis. Appl. Intell. 2016, 44, 687–703. [Google Scholar] [CrossRef]
- Guo, S.; Zhao, H.; Yang, W. Hierarchical feature selection with multi-granularity clustering structure. Inf. Sci. 2021, 568, 448–462. [Google Scholar] [CrossRef]
- Tuo, Q.; Zhao, H.; Hu, Q. Hierarchical feature selection with subtree based graph regularization. Knowl. Based Syst. 2019, 163, 996–1008. [Google Scholar] [CrossRef]
- Zheng, J.; Luo, C.; Li, T.; Chen, H. A novel hierarchical feature selection method based on large margin nearest neighbor learning. Neurocomputing 2022, 497, 1–12. [Google Scholar] [CrossRef]
- Trabelsi, M.; Meddouri, N.; Maddouri, M. A New Feature Selection Method for Nominal Classifier based on Formal Concept Analysis. Procedia Comput. Sci. 2017, 112, 186–194. [Google Scholar] [CrossRef]
- Azibi, H.; Meddouri, N.; Maddouri, M. Survey on Formal Concept Analysis Based Supervised Classification Techniques. In Machine Learning and Artificial Intelligence; IOS Press: Amsterdam, The Netherlands, 2020; pp. 21–29. [Google Scholar]
- Wang, C.; Huang, Y.; Shao, M.; Hu, Q.; Chen, D. Feature Selection Based on Neighborhood Self-Information. IEEE Trans. Cybern. 2020, 50, 4031–4042. [Google Scholar] [CrossRef]
- Wille, R. Formal concept analysis as mathematical theory of concepts and concept hierarchies. Lect. Notes Artificial Intell. (LNAI) 2005, 3626, 1–33. [Google Scholar] [CrossRef]
- Zhou, H.; Zhang, J.; Zhou, Y.; Guo, X.; Ma, Y. A feature selection algorithm of decision tree based on feature weight. Expert Syst. Appl. 2021, 164, 113842. [Google Scholar] [CrossRef]
- Venkatesh, B.; Anuradha, J. A Review of Feature Selection and Its Methods. Cybern. Inf. Technol. 2019, 19, 3–26. [Google Scholar] [CrossRef]
- Bahassine, S.; Madani, A.; Al-Sarem, M.; Kissi, M. Feature selection using an improved Chi-square for Arabic text classification. J. King Saud Univ. Comput. Inf. Sci. 2020, 32, 225–231. [Google Scholar] [CrossRef]
- Trivedi, S.K. A study on credit scoring modeling with different feature selection and machine learning approaches. Technol. Soc. 2020, 63, 101413. [Google Scholar] [CrossRef]
- Liu, Z.; Zhang, R.; Song, Y.; Ju, W.; Zhang, M. When does maml work the best? an empirical study on model-agnostic meta-learning in nlp applications. arXiv 2020, arXiv:2005.11700. [Google Scholar] [CrossRef]
- Yang, J.; Xu, H.; Mirzoyan, S.; Chen, T.; Liu, Z.; Liu, Z.; Ju, W.; Liu, L.; Xiao, Z.; Zhang, M.; et al. Poisoning medical knowledge using large language models. Nat. Mach. Intell. 2024, 6, 1156–1168. [Google Scholar] [CrossRef]
- Ju, W.; Mao, Z.; Yi, S.; Qin, Y.; Gu, Y.; Xiao, Z.; Wang, Y.; Luo, X.; Zhang, M. Hypergraph-enhanced Dual Semi-supervised Graph Classification. arXiv 2024, arXiv:2405.04773. [Google Scholar] [CrossRef]
- Zhao, H.; Hu, Q.; Zhu, P.; Wang, Y.; Wang, P. A Recursive Regularization Based Feature Selection Framework for Hierarchical Classification. IEEE Trans. Knowl. Data Eng. 2021, 33, 2833–2846. [Google Scholar] [CrossRef]
- Huang, H.; Liu, H. Feature selection for hierarchical classification via joint semantic and structural information of labels. Knowl. Based Syst. 2020, 195, 105655. [Google Scholar] [CrossRef]
- Liu, X.; Zhao, H. Robust hierarchical feature selection with a capped ℓ2-norm. Neurocomputing 2021, 443, 131–146. [Google Scholar] [CrossRef]
- UCI Machine Learning Repository. Available online: https://archive.ics.uci.edu (accessed on 6 January 2023).
- Yevtushenko, S. Concept Explorer, Open Source JAVA Software. 2009. Available online: http://sourceforge.net/projects/conexp (accessed on 6 January 2023).
0.01 | 0.05 | 0.1 | |
---|---|---|---|
z |
Dataset | No. Attributes | No. Instances | No. Classes | Data Type |
---|---|---|---|---|
Dermatology | 33 | 366 | 6 | Integer |
Glass | 10 | 214 | 7 | real |
Iris | 4 | 150 | 3 | real |
Lung-cancer | 56 | 32 | 3 | Integer |
Movement | 91 | 360 | 15 | Integer |
Pageblocks | 10 | 5473 | 5 | Integer, real |
Segmentation | 19 | 2310 | 7 | Integer, real |
Soybean | 35 | 307 | 19 | Integer |
Tunadromd | 242 | 4465 | 2 | Integer |
Wine | 13 | 178 | 3 | Integer, real |
Zoo | 17 | 101 | 7 | Nominal, Integer |
Dataset | No. Attributes | The Classification Accuracy (%) | ||
---|---|---|---|---|
Org-DT | Org-SVM | Org-ANN | ||
Dermatology | 33 | 84.99 | 84.50 | 85.09 |
Glass | 10 | 100.0 | 95.26 | 99.05 |
Iris | 4 | 78.67 | 76.98 | 77.00 |
Lung-cancer | 56 | 80.83 | 80.56 | 81.00 |
Movement | 91 | 73.06 | 72.69 | 72.88 |
Pageblocks | 10 | 97.94 | 96.05 | 97.88 |
Segmention | 19 | 89.09 | 88.63 | 88.70 |
Soybean | 35 | 96.00 | 95.98 | 96.00 |
Tunadromd | 242 | 96.64 | 99.10 | 99.22 |
Wine | 13 | 83.17 | 80.82 | 82.98 |
Zoo | 17 | 96.09 | 95.53 | 95.72 |
Dataset | The Classification Accuracy (%) Using Decision Tree-Based Feature Selection | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
LV.1 | LV.2 | LV.3 | LV.4 | LV.5 | ||||||||||||||||
No. Attribute | DT | SVM | ANN | No. Attribute | DT | SVM | ANN | No. Attribute | DT | SVM | ANN | No. Attribute | DT | SVM | ANN | Attribute | DT | SVM | ANN | |
Dermatology | 1 | 66.94 | 66.95 | 66.94 | 3 | 73.77 | 75.15 | 73.50 | 5 | 73.50 | 87.45 | 73.50 | 6 | 89.65 | 89.65 | 89.65 | 8 | 96.16 | 96.16 | 96.16 |
Glass | 1 | 100.00 | 82.24 | 99.53 | 3 | 100.00 | 85.19 | 100.00 | 4 | 100.00 | 85.19 | 99.53 | 5 | 100.00 | 99.05 | 99.05 | 7 | 100.00 | 100.00 | 99.52 |
Iris | 1 | 95.33 | 82.67 | 95.33 | 2 | 95.33 | 83.83 | 95.33 | 3 | 96.00 | 83.83 | 96.00 | 4 | 97.33 | 97.33 | 97.33 | 4 | 97.33 | 97.33 | 97.33 |
Lung-cancer | 1 | 87.50 | 74.17 | 87.50 | 3 | 87.50 | 62.50 | 78.12 | 5 | 87.50 | 74.17 | 81.25 | 9 | 80.83 | 80.83 | 78.33 | 11 | 80.83 | 80.83 | 78.33 |
Movement | 1 | 67.22 | 55.00 | 63.61 | 2 | 68.33 | 51.67 | 68.33 | 5 | 75.56 | 89.44 | 71.11 | 8 | 75.28 | 75.28 | 75.28 | 17 | 80.83 | 80.83 | 80.83 |
Page-blocks | 1 | 96.86 | 96.29 | 96.53 | 2 | 97.94 | 99.60 | 96.55 | 5 | 98.06 | 99.60 | 97.20 | 7 | 98.06 | 97.44 | 97.44 | 9 | 98.05 | 98.05 | 97.44 |
Segmentation | 1 | 86.15 | 85.67 | 85.89 | 2 | 87.06 | 92.64 | 86.67 | 5 | 89.00 | 91.21 | 89.00 | 9 | 98.00 | 98.00 | 98.00 | 10 | 96.41 | 96.41 | 96.41 |
Soybean | 1 | 87.23 | 70.50 | 78.72 | 3 | 97.87 | 62.00 | 97.87 | 5 | 95.74 | 70.00 | 91.87 | 11 | 98.00 | 98.00 | 98.00 | 15 | 98.00 | 98.00 | 98.00 |
Tunadromd | 1 | 93.28 | 93.28 | 93.28 | 2 | 96.64 | 96.64 | 96.64 | 4 | 96.86 | 96.75 | 96.75 | 6 | 96.98 | 97.76 | 97.42 | 6 | 96.98 | 97.76 | 97.42 |
Wine | 1 | 83.71 | 83.82 | 83.71 | 3 | 87.64 | 88.24 | 87.64 | 5 | 87.64 | 88.24 | 87.64 | 7 | 87.16 | 87.16 | 87.16 | 10 | 86.50 | 86.50 | 86.50 |
Zoo | 1 | 96.04 | 96.09 | 96.04 | 3 | 96.04 | 76.27 | 96.04 | 5 | 96.04 | 76.27 | 96.04 | 7 | 100.00 | 100.00 | 100.00 | 8 | 98.00 | 98.00 | 98.00 |
Dataset | The Classification Accuracy (%) Using FCA-Based Feature Selection | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
LV.1 | LV.2 | LV.3 | LV.4 | LV.5 | ||||||||||||||||
No. Attribute | DT | SVM | ANN | No. Attribute | DT | SVM | ANN | No. Attribute | DT | SVM | ANN | No. Attribute | DT | SVM | ANN | No. Attribute | DT | SVM | ANN | |
Dermatology | 13 | 77.58 | 77.58 | 77.58 | 21 | 77.58 | 77.58 | 77.58 | 24 | 77.58 | 77.58 | 77.58 | 28 | 77.58 | 77.58 | 77.58 | 33 | 77.58 | 77.58 | 77.58 |
Glass | 1 | 100.00 | 92.60 | 99.55 | 3 | 94.87 | 93.96 | 93.96 | 5 | 93.01 | 91.17 | 88.38 | 10 | 93.01 | 91.17 | 88.38 | 10 | 93.01 | 91.17 | 88.38 |
Iris | 2 | 94.67 | 94.00 | 94.00 | 3 | 73.33 | 49.33 | 58.67 | 4 | 80.00 | 84.00 | 84.00 | 4 | 80.00 | 84.00 | 84.00 | 4 | 80.00 | 84.00 | 84.00 |
Lung-cancer | 5 | 97.50 | 97.50 | 97.50 | 8 | 49.17 | 55.83 | 55.25 | 10 | 49.17 | 55.83 | 55.83 | 16 | 49.17 | 55.83 | 55.83 | 32 | 49.17 | 55.83 | 55.83 |
Movement | 7 | 63.33 | 60.28 | 60.28 | 12 | 63.33 | 60.28 | 60.28 | 19 | 99.44 | 99.17 | 99.17 | 28 | 99.44 | 99.17 | 99.17 | 32 | 99.44 | 99.17 | 99.17 |
Page-blocks | 4 | 90.39 | 85.31 | 85.31 | 4 | 90.39 | 85.31 | 85.31 | 7 | 99.80 | 99.85 | 99.85 | 10 | 99.80 | 99.85 | 99.85 | 10 | 99.80 | 99.85 | 99.85 |
Segmentation | 8 | 99.70 | 92.81 | 99.70 | 10 | 99.70 | 91.02 | 99.70 | 19 | 57.36 | 91.02 | 97.49 | 19 | 57.36 | 91.02 | 97.49 | 19 | 57.36 | 91.02 | 97.49 |
Soybean | 2 | 100.00 | 100.00 | 100 | 12 | 83.50 | 87.50 | 87.50 | 16 | 61.50 | 68.00 | 64.00 | 35 | 61.50 | 68.00 | 64.00 | 35 | 61.50 | 68.00 | 64.00 |
Tunadromd | 5 | 93.28 | 93.84 | 93.28 | 13 | 94.96 | 96.64 | 97.42 | 19 | 96.19 | 97.87 | 97.98 | 34 | 96.75 | 98.88 | 98.88 | 44 | 96.75 | 99.10 | 99.33 |
Wine | 3 | 81.50 | 83.76 | 83.76 | 6 | 69.71 | 68.56 | 68.56 | 9 | 78.14 | 81.50 | 81.50 | 13 | 78.14 | 81.50 | 81.50 | 13 | 78.14 | 81.50 | 81.50 |
Zoo | 7 | 88.18 | 88.18 | 88.18 | 10 | 88.18 | 88.18 | 88.18 | 12 | 61.55 | 61.55 | 59.55 | 17 | 61.55 | 61.55 | 59.55 | 17 | 61.55 | 61.55 | 59.55 |
Methods | Level | x ± SD | Paired t-Test | Sig. |
---|---|---|---|---|
DT | LV.1 | 86.69 ± 8.39 | 2.97 | 0.016 |
LV.2 | 89.14 ± 6.74 | 2.55 | 0.031 | |
LV.3 | 91.00 ± 7.13 | 1.59 | 0.146 | |
LV.4 | 92.43 ± 3.07 | 2.22 | 0.053 | |
LV.5 | 92.51 ± 2.17 | 3.03 | 0.014 | |
SVM | LV.1 | 79.34 ± 3.12 | 4.88 | 0.001 |
LV.2 | 77.70 ± 4.07 | 4.15 | 0.002 | |
LV.3 | 84.54 ± 3.52 | 2.85 | 0.019 | |
LV.4 | 92.27 ± 0.94 | 2.46 | 0.036 | |
LV.5 | 93.21 ± 0.88 | 1.56 | 0.153 | |
ANN | LV.1 | 86.28 ± 12.67 | 2.55 | 0.031 |
LV.2 | 88.00 ± 6.50 | 3.20 | 0.011 | |
LV.3 | 89.51 ± 6.20 | 2.59 | 0.029 | |
LV.4 | 92.02 ± 3.45 | 2.35 | 0.043 | |
LV.5 | 92.85 ± 3.33 | 1.65 | 0.133 |
Dataset | IG | CS | FS | BS | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
DT | SVM | ANN | No. | DT | SVM | ANN | No. | DT | SVM | ANN | No. | DT | SVM | ANN | No. | |
Dermatology | 92.35 | 93.00 | 92.35 | 12 | 86.07 | 87.29 | 86.07 | 7 | 85.79 | 87.85 | 84.64 | 9 | 85.79 | 87.85 | 85.79 | 33 |
Glass | 99.53 | 100.00 | 100.00 | 4 | 99.53 | 99.82 | 100.00 | 3 | 100.00 | 100.00 | 100.00 | 9 | 100.00 | 100.00 | 100.00 | 3 |
Iris | 95.33 | 95.33 | 95.00 | 2 | 95.33 | 94.22 | 95.33 | 2 | 96.00 | 96.04 | 96.00 | 1 | 96.00 | 96.04 | 96.67 | 3 |
Lung-cancer | 68.50 | 68.50 | 69.00 | 2 | 87.50 | 86.50 | 86.50 | 16 | 87.50 | 87.50 | 83.50 | 55 | 87.50 | 87.50 | 87.50 | 2 |
Movement | 66.64 | 66.65 | 65.45 | 16 | 75.00 | 74.51 | 76.02 | 34 | 76.94 | 82.50 | 77.85 | 88 | 79.44 | 82.50 | 83.33 | 6 |
Pageblocks | 97.92 | 97.92 | 97.80 | 4 | 98.19 | 97.63 | 98.02 | 5 | 98.14 | 98.06 | 97.94 | 8 | 98.32 | 98.12 | 98.06 | 4 |
Segmentation | 90.91 | 89.33 | 90.87 | 10 | 88.61 | 88.61 | 88.52 | 7 | 89.22 | 89.48 | 80.78 | 6 | 89.48 | 89.52 | 97.36 | 10 |
Soybean | 97.87 | 100.00 | 98.87 | 14 | 100.00 | 100.00 | 98.87 | 10 | 100.00 | 100.00 | 97.87 | 2 | 100.00 | 100.00 | 100.00 | 34 |
Wine | 82.58 | 82.50 | 82.00 | 5 | 83.15 | 83.15 | 84.22 | 6 | 85.96 | 88.22 | 90.45 | 9 | 87.08 | 88.48 | 89.89 | 10 |
Zoo | 99.00 | 99.50 | 100.00 | 8 | 95.09 | 96.56 | 96.00 | 4 | 96.04 | 96.08 | 100.00 | 6 | 96.04 | 96.08 | 96.04 | 16 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Saelee, J.; Wetchapram, P.; Wanichsombat, A.; Intarasit, A.; Muangprathub, J.; Boongasame, L.; Choopradit, B. Enhanced Feature Selection via Hierarchical Concept Modeling. Appl. Sci. 2024, 14, 10965. https://doi.org/10.3390/app142310965
Saelee J, Wetchapram P, Wanichsombat A, Intarasit A, Muangprathub J, Boongasame L, Choopradit B. Enhanced Feature Selection via Hierarchical Concept Modeling. Applied Sciences. 2024; 14(23):10965. https://doi.org/10.3390/app142310965
Chicago/Turabian StyleSaelee, Jarunee, Patsita Wetchapram, Apirat Wanichsombat, Arthit Intarasit, Jirapond Muangprathub, Laor Boongasame, and Boonyarit Choopradit. 2024. "Enhanced Feature Selection via Hierarchical Concept Modeling" Applied Sciences 14, no. 23: 10965. https://doi.org/10.3390/app142310965
APA StyleSaelee, J., Wetchapram, P., Wanichsombat, A., Intarasit, A., Muangprathub, J., Boongasame, L., & Choopradit, B. (2024). Enhanced Feature Selection via Hierarchical Concept Modeling. Applied Sciences, 14(23), 10965. https://doi.org/10.3390/app142310965