Machine-Learning-Based Disease Diagnosis and Prediction

A special issue of Diagnostics (ISSN 2075-4418). This special issue belongs to the section "Machine Learning and Artificial Intelligence in Diagnostics".

Deadline for manuscript submissions: 30 June 2025 | Viewed by 6215

Special Issue Editor


E-Mail Website
Guest Editor
Centre for Trusted Internet and Community, National University of Singapore (NUS), Singapore 119228, Singapore
Interests: artificial intelligence; machine learning; image segmentation; disease classification; computer vision

Special Issue Information

Dear Colleagues,

We are pleased to invite you to contribute to the Special Issue on ‘Machine-Learning-Based Disease Diagnosis and Prediction’ in Diagnostics. In recent years, the intersection of machine learning and healthcare has revolutionized disease diagnosis and prediction, presenting a promising frontier in medical research. This Special Issue aims to showcase the latest advancements, methodologies, and applications in leveraging machine learning techniques to enhance diagnostic accuracy and predict disease outcomes.

Machine learning algorithms have demonstrated remarkable efficacy in extracting meaningful patterns and insights from large-scale medical datasets, encompassing various modalities such as genomic, imaging, clinical, and wearable sensor data. By harnessing the power of artificial intelligence, researchers and clinicians can now decipher complex relationships between biomarkers, disease manifestations, and patient outcomes, leading to earlier detection, more accurate diagnosis, and personalized treatment strategies. The scope of this Special Issue encompasses a broad spectrum of research topics, including developing and validating machine learning models for disease classification, risk stratification, prognosis prediction, and treatment response assessment. We welcome original research articles, reviews, and methodological papers that explore innovative approaches, address key challenges, and contribute to advancing the field of machine-learning-based disease diagnosis and prediction.

This Special Issue aims to provide a platform for researchers and practitioners to share their latest findings, methodologies, and insights in machine-learning-based disease diagnosis and prediction. The subject matter of this Special Issue aligns with the scope of Diagnostics, which encompasses research on innovative diagnostic methods, tools, and technologies for various diseases. By fostering interdisciplinary collaboration and knowledge exchange, we aim at advancing the state of the art in disease diagnosis and prediction, ultimately contributing to improved patient outcomes and healthcare delivery.

In this Special Issue, original research articles and reviews are welcome. Research areas may include (but are not limited to) the following:

  • Development and validation of machine learning models for disease diagnosis and prediction;
  • Integration of multimodal data (e.g., imaging, genomic, and clinical) for enhanced diagnostic accuracy;
  • Artificial intelligence in healthcare;
  • Ethical and regulatory considerations in the deployment of machine learning for healthcare applications;
  • Case studies and applications of machine learning in specific disease domains (e.g., cancer, cardiovascular diseases, and neurodegenerative disorders);
  • Comparative studies and benchmarking of machine learning approaches for disease diagnosis and prediction;
  • Healthcare analytics;
  • Bioinformatics;
  • Medical imaging analysis.

We look forward to receiving your contributions.

Dr. Fakhar Abbas
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Diagnostics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • development and validation of machine learning models for disease diagnosis and prediction
  • integration of multimodal data (e.g., imaging, genomic, and clinical) for enhanced diagnostic accuracy
  • artificial intelligence in healthcare
  • ethical and regulatory considerations in the deployment of machine learning for healthcare applications
  • case studies and applications of machine learning in specific disease domains (e.g., cancer, cardiovascular diseases, and neurodegenerative disorders)
  • comparative studies and benchmarking of machine learning approaches for disease diagnosis and prediction
  • healthcare analytics
  • bioinformatics
  • medical imaging analysis

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

14 pages, 1757 KiB  
Article
Ensemble Algorithm Based on Gene Selection, Data Augmentation, and Boosting Approaches for Ovarian Cancer Classification
by Zne-Jung Lee, Jing-Xun Cai, Liang-Hung Wang and Ming-Ren Yang
Diagnostics 2024, 14(24), 2772; https://doi.org/10.3390/diagnostics14242772 - 10 Dec 2024
Viewed by 404
Abstract
Background: Ovarian cancer is a difficult and lethal illness that requires early detection and precise classification for effective therapy. Microarray technology has permitted the simultaneous assessment of hundreds of genes’ expression levels, yielding important insights into the molecular pathways driving ovarian cancer. To [...] Read more.
Background: Ovarian cancer is a difficult and lethal illness that requires early detection and precise classification for effective therapy. Microarray technology has permitted the simultaneous assessment of hundreds of genes’ expression levels, yielding important insights into the molecular pathways driving ovarian cancer. To reduce computational complexity and improve accuracy, choosing the most likely differential genes to explain the impacts of ovarian cancer is necessary. Medical datasets, including those related to ovarian cancer, are often limited in size due to privacy concerns, data collection challenges, and the rarity of certain conditions. Data augmentation allows researchers to expand the dataset, providing a larger and more diverse set of examples for model training. Recent advances in machine learning and bioinformatics have shown promise in improving ovarian cancer classification based on gene information. Methods: In this paper, we present an ensemble algorithm based on gene selection, data augmentation, and boosting approaches for ovarian cancer classification. In the proposed approach, the initial genetic data were first subjected to feature selection. Results: The target genes were screened and combined with data augmentation and ensemble boosting algorithms. From the results, the chosen ten genes could accurately classify ovarian cancer at 98.21%. Conclusions: We further show that the proposed algorithm based on clustering approaches is effective for real-world ovarian cancer data, with 100% accuracy and strong performance in distinguishing between distinct ovarian cancer subtypes. The proposed algorithm may help doctors identify ovarian cancer patients early and develop individualized treatment plans. Full article
(This article belongs to the Special Issue Machine-Learning-Based Disease Diagnosis and Prediction)
Show Figures

Figure 1

26 pages, 18958 KiB  
Article
CAD-EYE: An Automated System for Multi-Eye Disease Classification Using Feature Fusion with Deep Learning Models and Fluorescence Imaging for Enhanced Interpretability
by Maimoona Khalid, Muhammad Zaheer Sajid, Ayman Youssef, Nauman Ali Khan, Muhammad Fareed Hamid and Fakhar Abbas
Diagnostics 2024, 14(23), 2679; https://doi.org/10.3390/diagnostics14232679 - 27 Nov 2024
Viewed by 539
Abstract
Background: Diabetic retinopathy, hypertensive retinopathy, glaucoma, and contrast-related eye diseases are well-recognized conditions resulting from high blood pressure, rising blood glucose, and elevated eye pressure. Later-stage symptoms usually include patches of cotton wool, restricted veins in the optic nerve, and buildup of blood [...] Read more.
Background: Diabetic retinopathy, hypertensive retinopathy, glaucoma, and contrast-related eye diseases are well-recognized conditions resulting from high blood pressure, rising blood glucose, and elevated eye pressure. Later-stage symptoms usually include patches of cotton wool, restricted veins in the optic nerve, and buildup of blood in the optic nerve. Severe consequences include damage of the visual nerve, and retinal artery obstruction, and possible blindness may result from these conditions. An early illness diagnosis is made easier by the use of deep learning models and artificial intelligence (AI). Objectives: This study introduces a novel methodology called CAD-EYE for classifying diabetic retinopathy, hypertensive retinopathy, glaucoma, and contrast-related eye issues. Methods: The proposed system combines the features extracted using two deep learning (DL) models (MobileNet and EfficientNet) using feature fusion to increase the diagnostic system efficiency. The system uses fluorescence imaging for increasing accuracy as an image processing algorithm. The algorithm is added to increase the interpretability and explainability of the CAD-EYE system. This algorithm was not used in such an application in the previous literature to the best of the authors’ knowledge. The study utilizes datasets sourced from reputable internet platforms to train the proposed system. Results: The system was trained on 65,871 fundus images from the collected datasets, achieving a 98% classification accuracy. A comparative analysis demonstrates that CAD-EYE surpasses cutting-edge models such as ResNet, GoogLeNet, VGGNet, InceptionV3, and Xception in terms of classification accuracy. A state-of-the-art comparison shows the superior performance of the model against previous work in the literature. Conclusions: These findings support the usefulness of CAD-EYE as a diagnosis tool that can help medical professionals diagnose an eye disease. However, this tool will not be replacing optometrists. Full article
(This article belongs to the Special Issue Machine-Learning-Based Disease Diagnosis and Prediction)
Show Figures

Figure 1

23 pages, 3971 KiB  
Article
Using Machine Learning and Feature Importance to Identify Risk Factors for Mortality in Pediatric Heart Surgery
by Lorenz A. Kapsner, Manuel Feißt, Ariawan Purbojo, Hans-Ulrich Prokosch, Thomas Ganslandt, Sven Dittrich, Jonathan M. Mang and Wolfgang Wällisch
Diagnostics 2024, 14(22), 2587; https://doi.org/10.3390/diagnostics14222587 - 18 Nov 2024
Viewed by 641
Abstract
Background: The objective of this IRB-approved retrospective monocentric study was to identify risk factors for mortality after surgery for congenital heart defects (CHDs) in pediatric patients using machine learning (ML). CHD belongs to the most common congenital malformations, and remains the leading mortality [...] Read more.
Background: The objective of this IRB-approved retrospective monocentric study was to identify risk factors for mortality after surgery for congenital heart defects (CHDs) in pediatric patients using machine learning (ML). CHD belongs to the most common congenital malformations, and remains the leading mortality cause from birth defects. Methods: The most recent available hospital encounter for each patient with an age <18 years hospitalized for CHD-related cardiac surgery between the years 2011 and 2020 was included in this study. The cohort consisted of 1302 eligible patients (mean age [SD]: 402.92 [±562.31] days), who were categorized into four disease groups. A random survival forest (RSF) and the ‘eXtreme Gradient Boosting’ algorithm (XGB) were applied to model mortality (incidence: 5.6% [n = 73 events]). All models were then applied to predict the outcome in an independent holdout test dataset (40% of the cohort). Results: RSF and XGB achieved average C-indices of 0.85 (±0.01) and 0.79 (±0.03), respectively. Feature importance was assessed with ‘SHapley Additive exPlanations’ (SHAP) and ‘Time-dependent explanations of machine learning survival models’ (SurvSHAP(t)), both of which revealed high importance of the maximum values of serum creatinine observed within 72 h post-surgery for both ML methods. Conclusions: ML methods, along with model explainability tools, can reveal interesting insights into mortality risk after surgery for CHD. The proposed analytical workflow can serve as a blueprint for translating the analysis into a federated setting that builds upon the infrastructure of the German Medical Informatics Initiative. Full article
(This article belongs to the Special Issue Machine-Learning-Based Disease Diagnosis and Prediction)
Show Figures

Figure 1

17 pages, 3396 KiB  
Article
Temporal Lobe Epilepsy Focus Detection Based on the Correlation Between Brain MR Images and EEG Recordings with a Decision Tree
by Cansel Ficici, Ziya Telatar, Osman Erogul and Onur Kocak
Diagnostics 2024, 14(22), 2509; https://doi.org/10.3390/diagnostics14222509 - 9 Nov 2024
Viewed by 597
Abstract
Background/Objectives: In this study, a medical decision support system is presented to assist physicians in epileptic focus detection by correlating MRI and EEG data of temporal lobe epilepsy patients. Methods: By exploiting the asymmetry in the hippocampus in MRI images and using voxel-based [...] Read more.
Background/Objectives: In this study, a medical decision support system is presented to assist physicians in epileptic focus detection by correlating MRI and EEG data of temporal lobe epilepsy patients. Methods: By exploiting the asymmetry in the hippocampus in MRI images and using voxel-based morphometry analysis, gray matter reduction in the temporal and limbic lobes is detected, and epileptic focus prediction is realized. In addition, an epileptic focus is also determined by calculating the asymmetry score from EEG channels. Finally, epileptic focus detection was performed by associating MRI and EEG data with a decision tree. Results: The results obtained from the proposed algorithm provide 100% overlap with the physician’s finding on the EEG data. Conclusions: MRI and EEG correlation in epileptic focus detection was improved compared with physicians. The proposed algorithm can be used as a medical decision support system for epilepsy diagnosis, treatment, and surgery planning. Full article
(This article belongs to the Special Issue Machine-Learning-Based Disease Diagnosis and Prediction)
Show Figures

Figure 1

28 pages, 4011 KiB  
Article
Advanced Deep Learning Fusion Model for Early Multi-Classification of Lung and Colon Cancer Using Histopathological Images
by A. A. Abd El-Aziz, Mahmood A. Mahmood and Sameh Abd El-Ghany
Diagnostics 2024, 14(20), 2274; https://doi.org/10.3390/diagnostics14202274 - 12 Oct 2024
Viewed by 1405
Abstract
Background: In recent years, the healthcare field has experienced significant advancements. New diagnostic techniques, treatments, and insights into the causes of various diseases have emerged. Despite these progressions, cancer remains a major concern. It is a widespread illness affecting individuals of all ages [...] Read more.
Background: In recent years, the healthcare field has experienced significant advancements. New diagnostic techniques, treatments, and insights into the causes of various diseases have emerged. Despite these progressions, cancer remains a major concern. It is a widespread illness affecting individuals of all ages and leads to one out of every six deaths. Lung and colon cancer alone account for nearly two million fatalities. Though it is rare for lung and colon cancers to co-occur, the spread of cancer cells between these two areas—known as metastasis—is notably high. Early detection of cancer greatly increases survival rates. Currently, histopathological image (HI) diagnosis and appropriate treatment are key methods for reducing cancer mortality and enhancing survival rates. Digital image processing (DIP) and deep learning (DL) algorithms can be employed to analyze the HIs of five different types of lung and colon tissues. Methods: Therefore, this paper proposes a refined DL model that integrates feature fusion for the multi-classification of lung and colon cancers. The proposed model incorporates three DL architectures: ResNet-101V2, NASNetMobile, and EfficientNet-B0. Each model has limitations concerning variations in the shape and texture of input images. To address this, the proposed model utilizes a concatenate layer to merge the pre-trained individual feature vectors from ResNet-101V2, NASNetMobile, and EfficientNet-B0 into a single feature vector, which is then fine-tuned. As a result, the proposed DL model achieves high success in multi-classification by leveraging the strengths of all three models to enhance overall accuracy. This model aims to assist pathologists in the early detection of lung and colon cancer with reduced effort, time, and cost. The proposed DL model was evaluated using the LC25000 dataset, which contains colon and lung HIs. The dataset was pre-processed using resizing and normalization techniques. Results: The model was tested and compared with recent DL models, achieving impressive results: 99.8% for precision, 99.8% for recall, 99.8% for F1-score, 99.96% for specificity, and 99.94% for accuracy. Conclusions: Thus, the proposed DL model demonstrates exceptional performance across all classification categories. Full article
(This article belongs to the Special Issue Machine-Learning-Based Disease Diagnosis and Prediction)
Show Figures

Figure 1

11 pages, 11605 KiB  
Article
Evaluating Prediction Models with Hearing Handicap Inventory for the Elderly in Chronic Otitis Media Patients
by Hee Soo Yoon, Min Jin Kim, Kang Hyeon Lim, Min Suk Kim, Byung Jae Kang, Yoon Chan Rah and June Choi
Diagnostics 2024, 14(18), 2000; https://doi.org/10.3390/diagnostics14182000 - 10 Sep 2024
Viewed by 566
Abstract
Background: This retrospective, cross-sectional study aimed to assess the functional hearing capacity of individuals with Chronic Otitis Media (COM) using prediction modeling techniques and the Hearing Handicap Inventory for the Elderly (HHIE) questionnaire. This study investigated the potential of predictive models to identify [...] Read more.
Background: This retrospective, cross-sectional study aimed to assess the functional hearing capacity of individuals with Chronic Otitis Media (COM) using prediction modeling techniques and the Hearing Handicap Inventory for the Elderly (HHIE) questionnaire. This study investigated the potential of predictive models to identify hearing levels in patients with COM. Methods: We comprehensively examined 289 individuals diagnosed with COM, of whom 136 reported tinnitus and 143 did not. This study involved a detailed analysis of various patient characteristics and HHIE questionnaire results. Logistic and Random Forest models were employed and compared based on key performance metrics. Results: The logistic model demonstrated a slightly higher accuracy (73.56%), area under the curve (AUC; 0.73), Kappa value (0.45), and F1 score (0.78) than the Random Forest model. These findings suggest the superior predictive performance of the logistic model in identifying hearing levels in patients with COM. Conclusions: Although the AUC for the logistic regression did not meet the benchmark, this study highlights the potential for enhanced reliability and improved performance metrics using a larger dataset. The integration of prediction modeling techniques and the HHIE questionnaire shows promise for achieving greater diagnostic accuracy and refining intervention strategies for individuals with COM. Full article
(This article belongs to the Special Issue Machine-Learning-Based Disease Diagnosis and Prediction)
Show Figures

Figure 1

17 pages, 2101 KiB  
Article
Predictors of In-Hospital Mortality after Thrombectomy in Anterior Circulation Large Vessel Occlusion: A Retrospective, Machine Learning Study
by Ivan Petrović, Serena Broggi, Monika Killer-Oberpfalzer, Johannes A. R. Pfaff, Christoph J. Griessenauer, Isidora Milosavljević, Ana Balenović, Johannes S. Mutzenbach and Slaven Pikija
Diagnostics 2024, 14(14), 1531; https://doi.org/10.3390/diagnostics14141531 - 16 Jul 2024
Viewed by 1056
Abstract
Background: Despite the increased use of mechanical thrombectomy (MT) in recent years, there remains a lack of research on in-hospital mortality rates following the procedure, the primary factors influencing these rates, and the potential for predicting them. This study aimed to utilize interpretable [...] Read more.
Background: Despite the increased use of mechanical thrombectomy (MT) in recent years, there remains a lack of research on in-hospital mortality rates following the procedure, the primary factors influencing these rates, and the potential for predicting them. This study aimed to utilize interpretable machine learning (ML) to help clarify these uncertainties. Methods: This retrospective study involved patients with anterior circulation large vessel occlusion (LVO)-related ischemic stroke who underwent MT. The patient division was made into two groups: (I) the in-hospital death group, referred to as miserable outcome, and (II) the in-hospital survival group, or favorable outcome. Python 3.10.9 was utilized to develop the machine learning models, which consisted of two types based on input features: (I) the Pre-MT model, incorporating baseline features, and (II) the Post-MT model, which included both baseline and MT-related features. After a feature selection process, the models were trained, internally evaluated, and tested, after which interpretation frameworks were employed to clarify the decision-making processes. Results: This study included 602 patients with a median age of 76 years (interquartile range (IQR) 65–83), out of which 54% (n = 328) were female, and 22% (n = 133) had miserable outcomes. Selected baseline features were age, baseline National Institutes of Health Stroke Scale (NIHSS) value, neutrophil-to-lymphocyte ratio (NLR), international normalized ratio (INR), the type of the affected vessel (‘Vessel type’), peripheral arterial disease (PAD), baseline glycemia, and premorbid modified Rankin scale (pre-mRS). The highest odds ratio of 4.504 was observed with the presence of peripheral arterial disease (95% confidence interval (CI), 2.120–9.569). The Pre-MT model achieved an area under the curve (AUC) value of around 79% utilizing these features, and the interpretable framework discovered the baseline NIHSS value as the most influential factor. In the second data set, selected features were the same, excluding pre-mRS and including puncture-to-procedure-end time (PET) and onset-to-puncture time (OPT). The AUC value of the Post-MT model was around 84% with age being the highest-ranked feature. Conclusions: This study demonstrates the moderate to strong effectiveness of interpretable machine learning models in predicting in-hospital mortality following mechanical thrombectomy for ischemic stroke, with AUCs of 0.792 for the Pre-MT model and 0.837 for the Post-MT model. Key predictors included patient age, baseline NIHSS, NLR, INR, occluded vessel type, PAD, baseline glycemia, pre-mRS, PET, and OPT. These findings provide valuable insights into risk factors and could improve post-procedural patient management. Full article
(This article belongs to the Special Issue Machine-Learning-Based Disease Diagnosis and Prediction)
Show Figures

Figure 1

Back to TopTop