Next Article in Journal
The Potential of Transgenic Hybrid Aspen Plants with a Recombinant Lac Gene from the Fungus Trametes hirsuta to Degrade Trichlorophenol
Previous Article in Journal
Functional Genomics of Legumes in Bulgaria—Advances and Future Perspectives
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optimizing Model Performance and Interpretability: Application to Biological Data Classification

1
College of Computer Science and Technology, Jilin University, Changchun 130012, China
2
Systems Biology Lab for Metabolic Reprogramming, Department of Human Genetics and Cell Biology, School of Medicine, Southern University of Science and Technology, Shenzhen 518055, China
3
School of Mathematics, Jilin University, Changchun 130012, China
4
School of Artificial Intelligence, Jilin University, Changchun 130012, China
*
Authors to whom correspondence should be addressed.
Genes 2025, 16(3), 297; https://doi.org/10.3390/genes16030297
Submission received: 19 January 2025 / Revised: 11 February 2025 / Accepted: 24 February 2025 / Published: 28 February 2025
(This article belongs to the Section Bioinformatics)

Abstract

This study introduces a novel framework that simultaneously addresses the challenges of performance accuracy and result interpretability in transcriptomic-data-based classification. Background/objectives: In biological data classification, it is challenging to achieve both high performance accuracy and interpretability at the same time. This study presents a framework to address both challenges in transcriptomic-data-based classification. The goal is to select features, models, and a meta-voting classifier that optimizes both classification performance and interpretability. Methods: The framework consists of a four-step feature selection process: (1) the identification of metabolic pathways whose enzyme-gene expressions discriminate samples with different labels, aiding interpretability; (2) the selection of pathways whose expression variance is largely captured by the first principal component of the gene expression matrix; (3) the selection of minimal sets of genes, whose collective discerning power covers 95% of the pathway-based discerning power; and (4) the introduction of adversarial samples to identify and filter genes sensitive to such samples. Additionally, adversarial samples are used to select the optimal classification model, and a meta-voting classifier is constructed based on the optimized model results. Results: The framework applied to two cancer classification problems showed that in the binary classification, the prediction performance was comparable to the full-gene model, with F1-score differences of between −5% and 5%. In the ternary classification, the performance was significantly better, with F1-score differences ranging from −2% to 12%, while also maintaining excellent interpretability of the selected feature genes. Conclusions: This framework effectively integrates feature selection, adversarial sample handling, and model optimization, offering a valuable tool for a wide range of biological data classification problems. Its ability to balance performance accuracy and high interpretability makes it highly applicable in the field of computational biology.
Keywords: feature gene selection; model selection; machine learning; interpretability feature gene selection; model selection; machine learning; interpretability

Share and Cite

MDPI and ACS Style

Huang, Z.; Mu, X.; Cao, Y.; Chen, Q.; Qiao, S.; Shi, B.; Xiao, G.; Wang, Y.; Xu, Y. Optimizing Model Performance and Interpretability: Application to Biological Data Classification. Genes 2025, 16, 297. https://doi.org/10.3390/genes16030297

AMA Style

Huang Z, Mu X, Cao Y, Chen Q, Qiao S, Shi B, Xiao G, Wang Y, Xu Y. Optimizing Model Performance and Interpretability: Application to Biological Data Classification. Genes. 2025; 16(3):297. https://doi.org/10.3390/genes16030297

Chicago/Turabian Style

Huang, Zhenyu, Xuechen Mu, Yangkun Cao, Qiufen Chen, Siyu Qiao, Bocheng Shi, Gangyi Xiao, Yan Wang, and Ying Xu. 2025. "Optimizing Model Performance and Interpretability: Application to Biological Data Classification" Genes 16, no. 3: 297. https://doi.org/10.3390/genes16030297

APA Style

Huang, Z., Mu, X., Cao, Y., Chen, Q., Qiao, S., Shi, B., Xiao, G., Wang, Y., & Xu, Y. (2025). Optimizing Model Performance and Interpretability: Application to Biological Data Classification. Genes, 16(3), 297. https://doi.org/10.3390/genes16030297

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop