Optimizing Accuracy, Recall, Specificity, and Precision Using ILP
Abstract
:1. Introduction
2. Problem Formulation and Results
3. Simulations
4. Real-World Applications
4.1. Medical Diagnosis
4.1.1. Dataset, Preprocessing, and Classifier Training
4.1.2. Optimization Using ILP
- Accuracy: Ensures overall performance across both benign and malignant cases.
- Recall: Measures the ability to correctly identify malignant tumors, which is crucial for minimizing false negatives.
- Specificity: Focuses on correctly identifying benign tumors, reducing unnecessary follow-up procedures.
- (accuracy): A moderate weight to maintain overall performance.
- (recall): The highest priority was given to recall to ensure that malignant cases were identified as often as possible.
- (specificity): Specificity was considered, but with a lower weight than recall, as reducing false positives is important but less critical than detecting malignant cases.
4.1.3. Results and Discussion
4.2. Earthquake Occurrence Prediction
4.2.1. Dataset and Classifier Training
4.2.2. Optimization Using ILP
- Recall: This is the most critical metric, because missing an earthquake (false negative) could lead to catastrophic consequences by failing to issue a timely warning or take preventive actions.
- Specificity: While false positives (predicting an earthquake when there is none) should be minimized, they are generally less costly than missing a real earthquake. However, maintaining a reasonable level of specificity is still important to avoid unnecessary alarms and resources being diverted to false events.
- Accuracy: While accuracy is a useful overall measure, it can be misleading, especially in imbalanced datasets.
- (accuracy);
- (recall);
- (specificity).
4.2.3. Results and Discussion
4.3. Spam Email Detection
4.3.1. Dataset and Classifier Training
4.3.2. Optimization Using ILP
- Specificity: It is critical to minimize false positives (misclassifying legitimate emails as spam), especially if important emails are inadvertently sent to the spam folder.
- Recall: Missing a spam email (false negative) means that a harmful or unwanted message could reach the user’s inbox, which is a concern for security and user experience.
- Accuracy: Similar to the previous case study, here, accuracy may not be reliable in imbalanced datasets, where spam emails are fewer than non-spam emails.
- (accuracy);
- (recall);
- (specificity).
4.3.3. Results and Discussion
4.4. Sentiment Analysis of Movie Reviews
4.4.1. Dataset and Classifier Training
4.4.2. Optimization Using ILP
- (accuracy);
- (recall);
- (specificity).
4.4.3. Results and Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
- Hanley, J.A.; McNeil, B.J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982, 143, 29–36. [Google Scholar]
- Kononenko, I. Machine learning for medical diagnosis: History, state of the art and perspective. Artif. Intell. Med. 2001, 23, 89–109. [Google Scholar] [CrossRef]
- Phua, C.; Lee, V.; Smith, K.; Gayler, R. A comprehensive survey of data mining-based fraud detection research. arXiv 2010, arXiv:1009.6119. [Google Scholar]
- Metsis, V.; Androutsopoulos, I.; Paliouras, G. Spam filtering with Naive Bayes—Which Naive Bayes? In Proceedings of the Third Conference on Email and Anti-Spam (CEAS), Mountain View, CA, USA, 27–28 July 2006.
- Heggerud, C.M.; Xu, J.; Wang, H.; Lewis, M.A.; Zurawell, R.W.; Loewen, C.J.; Vinebrooke, R.D.; Ramazi, P. Predicting imminent cyanobacterial blooms in lakes using incomplete timely data. Water Resour. Res. 2024, 60, e2023WR035540. [Google Scholar] [CrossRef]
- Wang, Q.; Guo, Y.; Yu, L.; Li, P. Earthquake prediction based on spatio-temporal data mining: An LSTM network approach. IEEE Trans. Emerg. Top. Comput. 2017, 8, 148–158. [Google Scholar] [CrossRef]
- Cullerne Bown, W. Sensitivity and Specificity versus Precision and Recall, and Related Dilemmas. J. Classif. 2024, 41, 402–426. [Google Scholar]
- Zeng, S.; Li, X.; Liu, Y.; Huang, Q.; He, Y. Automatic Annotation Diagnostic Framework for Nasopharyngeal Carcinoma via Pathology–Fidelity GAN and Prior-Driven Classification. Bioengineering 2024, 11, 739. [Google Scholar] [CrossRef]
- Powers, D.M. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv 2020, arXiv:2010.16061. [Google Scholar]
- Alvarez, S.A. An Exact Analytical Relation Among Recall, Precision, and Classification Accuracy in Information Retrieval; Technical Report BCCS-02-01; Boston College: Boston, MA, USA, 2002; pp. 1–22. [Google Scholar]
- Berger, C.; Guda, C. Threshold Optimization for F-measure and Macro-averaged Precision and Recall. arXiv 2020, arXiv:2001.05647. [Google Scholar] [CrossRef]
- Pillai, I.; Fumera, G.; Roli, F. On the Detection of Thresholds in Multi-label Classification. Pattern Recognit. Lett. 2013, 34, 513–519. [Google Scholar]
- Tasche, D. A Plug-in Approach to Maximizing Precision at the Top and Recall at the Top. arXiv 2018, arXiv:1804.03077. [Google Scholar]
- Arora, G.; Merugu, S.; Saladi, A.; Rastogi, R. Leveraging Uncertainty Estimates to Improve Classifier Performance. In Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023), New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
- Koseoglu, B.; Traverso, L.; Topiwalla, M.; Kraev, E.; Szopory, Z. OTLP: Output Thresholding Using Mixed Integer Linear Programming. arXiv 2024, arXiv:2405.11230. [Google Scholar]
- Sanchez, I.E. Optimal Threshold Estimation for Binary Classifiers Using Game Theory. F1000Research 2016, 5, 2762. [Google Scholar] [CrossRef]
- Huang, J.; Ling, C.X. Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 2005, 17, 299–310. [Google Scholar] [CrossRef]
- Juba, B.; Le, H.S. Precision-Recall versus Accuracy and the Role of Large Data Sets. Proc. AAAI Conf. Artif. Intell. 2019, 33, 4039–4048. [Google Scholar] [CrossRef]
- Glaros, A.G.; Kline, R.B. Understanding the accuracy of tests with cutting scores: The sensitivity, specificity, and predictive value model. J. Clin. Psychol. 1988, 44, 1013–1023. [Google Scholar] [CrossRef]
- Charnes, A.; Cooper, W.W. Programming with linear fractional functionals. Nav. Res. Logist. Q. 1962, 9, 181–186. [Google Scholar] [CrossRef]
- Makhorin, A. GNU Linear Programming Kit (GLPK). Available online: https://www.gnu.org/software/glpk, (accessed on 15 January 2025).
- Çalik, S.; Güngör, M. On the expected values of the sample maximum of order statistics from a discrete uniform distribution. Appl. Math. Comput. 2004, 157, 695–700. [Google Scholar] [CrossRef]
- Wolberg, W.H.; Street, W.N.; Mangasarian, O.L. Breast cancer Wisconsin (diagnostic) data set. UCI Mach. Learn. Repos. 1992. [Google Scholar]
- Survey, U.S.G. United States Geological Survey (USGS). Available online: https://www.usgs.gov/ (accessed on 15 January 2025).
- Altman, N.S. An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 1992, 46, 175–185. [Google Scholar] [CrossRef]
- jackksoncsie. Spam Email Dataset. Available online: https://www.kaggle.com/datasets/jackksoncsie/spam-email-dataset/data (accessed on 15 January 2025).
- Face, H. IMDB Dataset. Available online: https://huggingface.co/datasets/imdb (accessed on 25 January 2025).
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019. [Google Scholar]
Threshold | Recall | Specificity | Accuracy | Weighted Metric |
Default () | 94.34% | 97.76% | 96.49% | 95.45% |
Optimized () | 97.64% | 95.80% | 96.49% | 97.04% |
Threshold | Recall | Specificity | Accuracy | Weighted Metric |
Default () | 72.65% | 73.84% | 73.23% | 73.06% |
Optimized () | 84.65% | 60.69% | 72.92% | 76.29% |
Threshold | Recall | Specificity | Accuracy | Weighted Metric |
Default () | 93.77% | 99.82% | 98.29% | 97.85% |
Optimized () | 98.62% | 99.12% | 98.99% | 98.95% |
Threshold | Recall | Specificity | Accuracy | Weighted Metric |
Default () | 74.03% | 83.59% | 78.81% | 78.33% |
Optimized () | 78.38% | 79.55% | 78.97% | 78.91% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Marioriyad, A.; Ramazi, P. Optimizing Accuracy, Recall, Specificity, and Precision Using ILP. Mathematics 2025, 13, 1059. https://doi.org/10.3390/math13071059
Marioriyad A, Ramazi P. Optimizing Accuracy, Recall, Specificity, and Precision Using ILP. Mathematics. 2025; 13(7):1059. https://doi.org/10.3390/math13071059
Chicago/Turabian StyleMarioriyad, Arash, and Pouria Ramazi. 2025. "Optimizing Accuracy, Recall, Specificity, and Precision Using ILP" Mathematics 13, no. 7: 1059. https://doi.org/10.3390/math13071059
APA StyleMarioriyad, A., & Ramazi, P. (2025). Optimizing Accuracy, Recall, Specificity, and Precision Using ILP. Mathematics, 13(7), 1059. https://doi.org/10.3390/math13071059