BioMedInformatics

24 pages, 11145 KB

Open AccessArticle

Comparing ANOVA and PowerShap Feature Selection Methods via Shapley Additive Explanations of Models of Mental Workload Built with the Theta and Alpha EEG Band Ratios

by Bujar Raufi and Luca Longo

BioMedInformatics 2024, 4(1), 853-876; https://doi.org/10.3390/biomedinformatics4010048 - 19 Mar 2024

Cited by 6 | Viewed by 3133

Abstract

Background: Creating models to differentiate self-reported mental workload perceptions is challenging and requires machine learning to identify features from EEG signals. EEG band ratios quantify human activity, but limited research on mental workload assessment exists. This study evaluates the use of theta-to-alpha [...] Read more.

Background: Creating models to differentiate self-reported mental workload perceptions is challenging and requires machine learning to identify features from EEG signals. EEG band ratios quantify human activity, but limited research on mental workload assessment exists. This study evaluates the use of theta-to-alpha and alpha-to-theta EEG band ratio features to distinguish human self-reported perceptions of mental workload. Methods: In this study, EEG data from 48 participants were analyzed while engaged in resting and task-intensive activities. Multiple mental workload indices were developed using different EEG channel clusters and band ratios. ANOVA’s F-score and PowerSHAP were used to extract the statistical features. At the same time, models were built and tested using techniques such as Logistic Regression, Gradient Boosting, and Random Forest. These models were then explained using Shapley Additive Explanations. Results: Based on the results, using PowerSHAP to select features led to improved model performance, exhibiting an accuracy exceeding 90% across three mental workload indexes. In contrast, statistical techniques for model building indicated poorer results across all mental workload indexes. Moreover, using Shapley values to evaluate feature contributions to the model output, it was noted that features rated low in importance by both ANOVA F-score and PowerSHAP measures played the most substantial role in determining the model output. Conclusions: Using models with Shapley values can reduce data complexity and improve the training of better discriminative models for perceived human mental workload. However, the outcomes can sometimes be unclear due to variations in the significance of features during the selection process and their actual impact on the model output. Full article

(This article belongs to the Special Issue Feature Papers in Medical Statistics and Data Science Section)

► Show Figures

Figure 1

16 pages, 918 KB

Open AccessReview

Generative Pre-Trained Transformer-Empowered Healthcare Conversations: Current Trends, Challenges, and Future Directions in Large Language Model-Enabled Medical Chatbots

by James C. L. Chow, Valerie Wong and Kay Li

BioMedInformatics 2024, 4(1), 837-852; https://doi.org/10.3390/biomedinformatics4010047 - 14 Mar 2024

Cited by 42 | Viewed by 9296

Abstract

This review explores the transformative integration of artificial intelligence (AI) and healthcare through conversational AI leveraging Natural Language Processing (NLP). Focusing on Large Language Models (LLMs), this paper navigates through various sections, commencing with an overview of AI’s significance in healthcare and the [...] Read more.

This review explores the transformative integration of artificial intelligence (AI) and healthcare through conversational AI leveraging Natural Language Processing (NLP). Focusing on Large Language Models (LLMs), this paper navigates through various sections, commencing with an overview of AI’s significance in healthcare and the role of conversational AI. It delves into fundamental NLP techniques, emphasizing their facilitation of seamless healthcare conversations. Examining the evolution of LLMs within NLP frameworks, the paper discusses key models used in healthcare, exploring their advantages and implementation challenges. Practical applications in healthcare conversations, from patient-centric utilities like diagnosis and treatment suggestions to healthcare provider support systems, are detailed. Ethical and legal considerations, including patient privacy, ethical implications, and regulatory compliance, are addressed. The review concludes by spotlighting current challenges, envisaging future trends, and highlighting the transformative potential of LLMs and NLP in reshaping healthcare interactions. Full article

(This article belongs to the Special Issue Computational Biology and Artificial Intelligence in Medicine)

► Show Figures

Figure 1

14 pages, 1861 KB

Open AccessArticle

Overall Survival Time Estimation for Epithelioid Peritoneal Mesothelioma Patients from Whole-Slide Images

by Kleanthis Marios Papadopoulos, Panagiotis Barmpoutis, Tania Stathaki, Vahan Kepenekian, Peggy Dartigues, Séverine Valmary-Degano, Claire Illac-Vauquelin, Gerlinde Avérous, Anne Chevallier, Marie-Hélène Laverriere, Laurent Villeneuve, Olivier Glehen, Sylvie Isaac, Juliette Hommell-Fontaine, Francois Ng Kee Kwong and Nazim Benzerdjeb

BioMedInformatics 2024, 4(1), 823-836; https://doi.org/10.3390/biomedinformatics4010046 - 13 Mar 2024

Cited by 1 | Viewed by 2423

Abstract

Background: The advent of Deep Learning initiated a new era in which neural networks relying solely on Whole-Slide Images can estimate the survival time of cancer patients. Remarkably, despite deep learning’s potential in this domain, no prior research has been conducted on image-based [...] Read more.

Background: The advent of Deep Learning initiated a new era in which neural networks relying solely on Whole-Slide Images can estimate the survival time of cancer patients. Remarkably, despite deep learning’s potential in this domain, no prior research has been conducted on image-based survival analysis specifically for peritoneal mesothelioma. Prior studies performed statistical analysis to identify disease factors impacting patients’ survival time. Methods: Therefore, we introduce MPeMSupervisedSurv, a Convolutional Neural Network designed to predict the survival time of patients diagnosed with this disease. We subsequently perform patient stratification based on factors such as their Peritoneal Cancer Index and on whether patients received chemotherapy treatment. Results: MPeMSupervisedSurv demonstrates improvements over comparable methods. Using our proposed model, we performed patient stratification to assess the impact of clinical variables on survival time. Notably, the inclusion of information regarding adjuvant chemotherapy significantly enhances the model’s predictive prowess. Conversely, repeating the process for other factors did not yield significant performance improvements. Conclusions: Overall, MPeMSupervisedSurv is an effective neural network which can predict the survival time of peritoneal mesothelioma patients. Our findings also indicate that treatment by adjuvant chemotherapy could be a factor affecting survival time. Full article

(This article belongs to the Special Issue Deep Learning Methods and Application for Bioinformatics and Healthcare)

► Show Figures

Graphical abstract

12 pages, 5186 KB

Open AccessArticle

Genetic Optimization in Uncovering Biologically Meaningful Gene Biomarkers for Glioblastoma Subtypes

by Petros Paplomatas, Ioanna-Efstathia Douroumi, Panagiotis Vlamos and Aristidis Vrahatis

BioMedInformatics 2024, 4(1), 811-822; https://doi.org/10.3390/biomedinformatics4010045 - 8 Mar 2024

Cited by 2 | Viewed by 1819

Abstract

Background: Glioblastoma multiforme (GBM) is a highly aggressive brain cancer known for its challenging survival rates; it is characterized by distinct subtypes, such as the proneural and mesenchymal states. The development of targeted therapies is critically dependent on a thorough understanding of these [...] Read more.

Background: Glioblastoma multiforme (GBM) is a highly aggressive brain cancer known for its challenging survival rates; it is characterized by distinct subtypes, such as the proneural and mesenchymal states. The development of targeted therapies is critically dependent on a thorough understanding of these subtypes. Advances in single-cell RNA-sequencing (scRNA-seq) have opened new avenues for identifying subtype-specific gene biomarkers, which are essential for innovative treatments. Methods: This study introduces a genetic optimization algorithm designed to select a precise set of genes that clearly differentiate between the proneural and mesenchymal GBM subtypes. By integrating differential gene expression analysis with gene variability assessments, our dual-criterion strategy ensures the selection of genes that are not only differentially expressed between subtypes but also exhibit consistent variability patterns. This approach enhances the biological relevance of identified biomarkers. We applied this algorithm to scRNA-seq data from GBM samples, focusing on the discovery of subtype-specific gene biomarkers. Results: The application of our genetic optimization algorithm to scRNA-seq data successfully identified significant genes that are closely associated with the fundamental characteristics of GBM. These genes show a strong potential to distinguish between the proneural and mesenchymal subtypes, offering insights into the molecular underpinnings of GBM heterogeneity. Conclusions: This study introduces a novel approach for biomarker discovery in GBM that is potentially applicable to other complex diseases. By leveraging scRNA-seq data, our method contributes to the development of targeted therapies, highlighting the importance of precise biomarker identification in personalized medicine. Full article

(This article belongs to the Special Issue Editor's Choice Series for the Applied Biomedical Data Science Section)

► Show Figures

Figure 1

15 pages, 1079 KB

Open AccessArticle

Pediatric and Adolescent Seizure Detection: A Machine Learning Approach Exploring the Influence of Age and Sex in Electroencephalogram Analysis

by Lan Wei and Catherine Mooney

BioMedInformatics 2024, 4(1), 796-810; https://doi.org/10.3390/biomedinformatics4010044 - 6 Mar 2024

Cited by 3 | Viewed by 2556

Abstract

Background: Epilepsy, a prevalent neurological disorder characterized by recurrent seizures affecting an estimated 70 million people worldwide, poses a significant diagnostic challenge. EEG serves as an important tool in identifying these seizures, but the manual examination of EEGs by experts is time-consuming. To [...] Read more.

Background: Epilepsy, a prevalent neurological disorder characterized by recurrent seizures affecting an estimated 70 million people worldwide, poses a significant diagnostic challenge. EEG serves as an important tool in identifying these seizures, but the manual examination of EEGs by experts is time-consuming. To expedite this process, automated seizure detection methods have emerged as powerful aids for expert EEG analysis. It is worth noting that while such methods are well-established for adult EEGs, they have been underdeveloped for pediatric and adolescent EEGs. This study sought to address this gap by devising an automatic seizure detection system tailored for pediatric and adolescent EEG data. Methods: Leveraging publicly available datasets, the TUH pediatric and adolescent EEG and CHB-MIT EEG datasets, the machine learning-based models were constructed. The TUH pediatric and adolescent EEG dataset was divided into training (n = 118), validation (n = 19), and testing (n = 37) subsets, with special attention to ensure a clear demarcation between the individuals in the training and test sets to preserve the test set’s independence. The CHB-MIT EEG dataset was used as an external test set. Age and sex were incorporated as features in the models to investigate their potential influence on seizure detection. Results: By leveraging 20 features extracted from both time and frequency domains, along with age as an additional feature, the method achieved an accuracy of 98.95% on the TUH test set and 64.82% on the CHB-MIT external test set. Our investigation revealed that age is a crucial factor for accurate seizure detection in pediatric and adolescent EEGs. Conclusion: The outcomes of this study hold substantial promise in supporting researchers and clinicians engaged in the automated analysis of seizures in pediatric and adolescent EEGs. Full article

(This article belongs to the Special Issue Editor-in-Chief's Choices in Biomedical Informatics)

► Show Figures

Graphical abstract

16 pages, 3180 KB

Open AccessArticle

The Effect of Data Missingness on Machine Learning Predictions of Uncontrolled Diabetes Using All of Us Data

by Zain Jabbar and Peter Washington

BioMedInformatics 2024, 4(1), 780-795; https://doi.org/10.3390/biomedinformatics4010043 - 6 Mar 2024

Cited by 1 | Viewed by 2258

Abstract

Electronic Health Records (EHR) provide a vast amount of patient data that are relevant to predicting clinical outcomes. The inherent presence of missing values poses challenges to building performant machine learning models. This paper aims to investigate the effect of various imputation methods [...] Read more.

Electronic Health Records (EHR) provide a vast amount of patient data that are relevant to predicting clinical outcomes. The inherent presence of missing values poses challenges to building performant machine learning models. This paper aims to investigate the effect of various imputation methods on the National Institutes of Health’s All of Us dataset, a dataset containing a high degree of data missingness. We apply several imputation techniques such as mean substitution, constant filling, and multiple imputation on the same dataset for the task of diabetes prediction. We find that imputing values causes heteroskedastic performance for machine learning models with increased data missingness. That is, the more missing values a patient has for their tests, the higher variance there is on a diabetes model AUROC, F1, precision, recall, and accuracy scores. This highlights a critical challenge in using EHR data for predictive modeling. This work highlights the need for future research to develop methodologies to mitigate the effects of missing data and heteroskedasticity in EHR-based predictive models. Full article

(This article belongs to the Special Issue Feature Papers in Medical Statistics and Data Science Section)

► Show Figures

Graphical abstract

26 pages, 629 KB

Open AccessReview

Machine Learning Models and Technologies for Evidence-Based Telehealth and Smart Care: A Review

by Stella C. Christopoulou

BioMedInformatics 2024, 4(1), 754-779; https://doi.org/10.3390/biomedinformatics4010042 - 4 Mar 2024

Cited by 10 | Viewed by 7695

Abstract

Background: Over the past few years, clinical studies have utilized machine learning in telehealth and smart care for disease management, self-management, and managing health issues like pulmonary diseases, heart failure, diabetes screening, and intraoperative risks. However, a systematic review of machine learning’s use [...] Read more.

Background: Over the past few years, clinical studies have utilized machine learning in telehealth and smart care for disease management, self-management, and managing health issues like pulmonary diseases, heart failure, diabetes screening, and intraoperative risks. However, a systematic review of machine learning’s use in evidence-based telehealth and smart care is lacking, as evidence-based practice aims to eliminate biases and subjective opinions. Methods: The author conducted a mixed methods review to explore machine learning applications in evidence-based telehealth and smart care. A systematic search of the literature was performed during 16 June 2023–27 June 2023 in Google Scholar, PubMed, and the clinical registry platform ClinicalTrials.gov. The author included articles in the review if they were implemented by evidence-based health informatics and concerned with telehealth and smart care technologies. Results: The author identifies 18 key studies (17 clinical trials) from 175 citations found in internet databases and categorizes them using problem-specific groupings, medical/health domains, machine learning models, algorithms, and techniques. Conclusions: Machine learning combined with the application of evidence-based practices in healthcare can enhance telehealth and smart care strategies by improving quality of personalized care, early detection of health-related problems, patient quality of life, patient-physician communication, resource efficiency and cost-effectiveness. However, this requires interdisciplinary expertise and collaboration among stakeholders, including clinicians, informaticians, and policymakers. Therefore, further research using clinicall studies, systematic reviews, analyses, and meta-analyses is required to fully exploit the potential of machine learning in this area. Full article

(This article belongs to the Special Issue Feature Papers in Clinical Informatics Section)

► Show Figures

Graphical abstract

21 pages, 4992 KB

Open AccessArticle

Forecasting Survival Rates in Metastatic Colorectal Cancer Patients Undergoing Bevacizumab-Based Chemotherapy: A Machine Learning Approach

by Sergio Sánchez-Herrero, Abtin Tondar, Elena Perez-Bernabeu, Laura Calvet and Angel A. Juan

BioMedInformatics 2024, 4(1), 733-753; https://doi.org/10.3390/biomedinformatics4010041 - 2 Mar 2024

Cited by 1 | Viewed by 2285

Abstract

Background: Antibiotics can play a pivotal role in the treatment of colorectal cancer (CRC) at various stages of the disease, both directly and indirectly. Identifying novel patterns of antibiotic effects or responses in CRC within extensive medical data poses a significant challenge that [...] Read more.

Background: Antibiotics can play a pivotal role in the treatment of colorectal cancer (CRC) at various stages of the disease, both directly and indirectly. Identifying novel patterns of antibiotic effects or responses in CRC within extensive medical data poses a significant challenge that can be addressed through algorithmic approaches. Machine Learning (ML) emerges as a promising solution for predicting clinical outcomes using clinical and heterogeneous cancer data. In the pursuit of our objective, we employed ML techniques for predicting CRC mortality and antibiotic influence. Methods: We utilized a dataset to examine the accuracy of death prediction in metastatic colorectal cancer. In addition, we analyzed the association between antibiotic exposure and mortality in metastatic colorectal cancer. The dataset comprised 147 patients, nineteen independent variables, and one dependent variable. Our analysis involved testing different classification-supervised ML, including an oversampling pool for classification models, Logistic Regression, Decision Trees, Naive Bayes, Support Vector Machine, Random Forest, XGBboost Classifier, a consensus of all models, and a consensus of top models (meta models). Results: The consensus of the top models’ classifier exhibited the highest accuracy among the algorithms tested (93%). This model met the standards for good accuracy, surpassing the 90% threshold considered useful in ML applications. Consistent with the accuracy results, other metrics are also good, including precision (0.96), recall (0.93), F-Beta (0.94), and AUC (0.93). Hazard ratio analysis suggests that there is no discernible difference between patients who received antibiotics and those who did not. Conclusions: Our modelling approach provides an alternative for analyzing and predicting the relationship between antibiotics and mortality in metastatic colorectal cancer patients treated with bevacizumab, complementing classic statistical methods. This methodology lays the groundwork for future use of datasets in cancer treatment research and highlights the advantages of meta models. Full article

(This article belongs to the Special Issue Feature Papers in Computational Biology and Medicine)

► Show Figures

Figure 1

12 pages, 2906 KB

Open AccessArticle

Predictive Analysis of Endoscope Demand in Otolaryngology Outpatient Settings

by David Lanier, Cristie Roush, Gwendolyn Young and Sara Masoud

BioMedInformatics 2024, 4(1), 721-732; https://doi.org/10.3390/biomedinformatics4010040 - 2 Mar 2024

Cited by 2 | Viewed by 1830

Abstract

Background: There has been a trend to transit reprocessing of flexible endoscopes from a high-level disinfectant (HLD) centralized manner to sterilization performed by nursing staff in some Ear, Nose, and Throat (ENT) clinics. In doing so, the clinic nursing staff are responsible for [...] Read more.

Background: There has been a trend to transit reprocessing of flexible endoscopes from a high-level disinfectant (HLD) centralized manner to sterilization performed by nursing staff in some Ear, Nose, and Throat (ENT) clinics. In doing so, the clinic nursing staff are responsible for predicting and managing clinical demand for flexible endoscopes. The HLD disinfection process is time-consuming and requires specialized training and competency to be performed safely. Solely depending on human expertise for predicting the flexible endoscope demands is unreliable and produced a concern of an inadequate supply of devices available for diagnostic purposes. Method: The demand for flexible endoscopes for future patient visits has not been well studied but can be modeled based on patients’ historical information, provider, and other visit-related factors. Such factors are available to the clinic before the visit. Binary classifiers can be used to help inform the sterile processing department of reprocessing needs days or weeks earlier for each patient. Results: Among all our trained models, Logistic Regression reports an average AUC ROC score of 89% and accuracy of 80%. Conclusion: The proposed framework not only significantly reduces the reprocessing efforts in terms of time spent on communication, cleaning, scheduling, and transferring scopes, but also helps to improve patient safety by reducing the exposure risk to potential infections. Full article

(This article belongs to the Special Issue Feature Papers in Applied Biomedical Data Science)

► Show Figures

Figure 1

12 pages, 1024 KB

Open AccessArticle

The Development and Usability Assessment of an Augmented Reality Decision Support System to Address Burn Patient Management

by Sena Veazey, Nicole Caldwell, David Luellen, Angela Samosorn, Allison McGlasson, Patricia Colston, Craig Fenrich, Jose Salinas, Jared Mike, Jacob Rivera and Maria Serio-Melvin

BioMedInformatics 2024, 4(1), 709-720; https://doi.org/10.3390/biomedinformatics4010039 - 1 Mar 2024

Cited by 3 | Viewed by 2498

Abstract

Critical care injuries, such as burn trauma, require specialized skillsets and knowledge. A clinical decision support system to aid clinicians in providing burn patient management can increase proficiency and provide knowledge content for specific interventions. In austere environments, decision support tools can be [...] Read more.

Critical care injuries, such as burn trauma, require specialized skillsets and knowledge. A clinical decision support system to aid clinicians in providing burn patient management can increase proficiency and provide knowledge content for specific interventions. In austere environments, decision support tools can be used to aid in decision making and task guidance when skilled personnel or resources are limited. Therefore, we developed a novel software system that utilizes augmented reality (AR) capabilities to provide enhanced step-by-step instructions based on best practices for managing burn patients. To better understand how new technologies, such as AR, can be used for burn care management, we developed a burn care application for use on a heads-up display. We developed four sub-set applications for documenting and conducting burn wound mapping, fluid resuscitation, medication calculations, and an escharotomy. After development, we conducted a usability study utilizing the System Usability Scale, pre- and post- simulation surveys, and after-action reviews to evaluate the AR-based software application in a simulation scenario. Results of the study indicate that the decision support tool has generalized usability and subjects were able to use the software as intended. Here we present the first use case of a comprehensive burn management system utilizing augmented reality capabilities to deliver care. Full article

(This article belongs to the Special Issue Feature Papers in Computational Biology and Medicine)

► Show Figures

Graphical abstract

19 pages, 3556 KB

Open AccessArticle

Reliability and Agreement of Free Web-Based 3D Software for Computing Facial Area and Volume Measurements

by Oguzhan Topsakal, Philip Sawyer, Tahir Cetin Akinci, Elif Topsakal and M. Mazhar Celikoyar

BioMedInformatics 2024, 4(1), 690-708; https://doi.org/10.3390/biomedinformatics4010038 - 1 Mar 2024

Cited by 2 | Viewed by 1888

Abstract

Background: Facial surgeries require meticulous planning and outcome assessments, where facial analysis plays a critical role. This study introduces a new approach by utilizing three-dimensional (3D) imaging techniques, which are known for their ability to measure facial areas and volumes accurately. The purpose [...] Read more.

Background: Facial surgeries require meticulous planning and outcome assessments, where facial analysis plays a critical role. This study introduces a new approach by utilizing three-dimensional (3D) imaging techniques, which are known for their ability to measure facial areas and volumes accurately. The purpose of this study is to introduce and evaluate a free web-based software application designed to take area and volume measurements on 3D models of patient faces. Methods: This study employed the online facial analysis software to conduct ten measurements on 3D models of subjects, including five measurements of area and five measurements of volume. These measurements were then compared with those obtained from the established 3D modeling software called Blender (version 3.2) using the Bland–Altman plot. To ensure accuracy, the intra-rater and inter-rater reliabilities of the web-based software were evaluated using the Intraclass Correlation Coefficient (ICC) method. Additionally, statistical assumptions such as normality and homoscedasticity were rigorously verified before analysis. Results: This study found that the web-based facial analysis software showed high agreement with the 3D software Blender within 95% confidence limits. Moreover, the online application demonstrated excellent intra-rater and inter-rater reliability in most analyses, as indicated by the ICC test. Conclusion: The findings suggest that the free online 3D software is reliable for facial analysis, particularly in measuring areas and volumes. This indicates its potential utility in enhancing surgical planning and evaluation in facial surgeries. This study underscores the software’s capability to improve surgical outcomes by integrating precise area and volume measurements into facial surgery planning and assessment processes. Full article

(This article belongs to the Special Issue Application of Semantic Web Technologies in Biomedicine and Biomedical Informatics)

► Show Figures

Graphical abstract

17 pages, 325 KB

Open AccessReview

Revolutionizing Kidney Transplantation: Connecting Machine Learning and Artificial Intelligence with Next-Generation Healthcare—From Algorithms to Allografts

by Luís Ramalhete, Paula Almeida, Raquel Ferreira, Olga Abade, Cristiana Teixeira and Rúben Araújo

BioMedInformatics 2024, 4(1), 673-689; https://doi.org/10.3390/biomedinformatics4010037 - 1 Mar 2024

Cited by 11 | Viewed by 7036

Abstract

This review explores the integration of artificial intelligence (AI) and machine learning (ML) into kidney transplantation (KT), set against the backdrop of a significant donor organ shortage and the evolution of ‘Next-Generation Healthcare’. Its purpose is to evaluate how AI and ML can [...] Read more.

This review explores the integration of artificial intelligence (AI) and machine learning (ML) into kidney transplantation (KT), set against the backdrop of a significant donor organ shortage and the evolution of ‘Next-Generation Healthcare’. Its purpose is to evaluate how AI and ML can enhance the transplantation process, from donor selection to postoperative patient care. Our methodology involved a comprehensive review of current research, focusing on the application of AI and ML in various stages of KT. This included an analysis of donor–recipient matching, predictive modeling, and the improvement in postoperative care. The results indicated that AI and ML significantly improve the efficiency and success rates of KT. They aid in better donor–recipient matching, reduce organ rejection, and enhance postoperative monitoring and patient care. Predictive modeling, based on extensive data analysis, has been particularly effective in identifying suitable organ matches and anticipating postoperative complications. In conclusion, this review discusses the transformative impact of AI and ML in KT, offering more precise, personalized, and effective healthcare solutions. Their integration into this field addresses critical issues like organ shortages and post-transplant complications. However, the successful application of these technologies requires careful consideration of their ethical, privacy, and training aspects in healthcare settings. Full article

(This article belongs to the Special Issue Deep Learning Methods and Application for Bioinformatics and Healthcare)

► Show Figures

Graphical abstract

12 pages, 1363 KB

Open AccessArticle

Classification and Explanation of Iron Deficiency Anemia from Complete Blood Count Data Using Machine Learning

by Siddartha Pullakhandam and Susan McRoy

BioMedInformatics 2024, 4(1), 661-672; https://doi.org/10.3390/biomedinformatics4010036 - 1 Mar 2024

Cited by 14 | Viewed by 8564

Abstract

Background: Currently, discriminating Iron Deficiency Anemia (IDA) from other anemia requires an expensive test (serum ferritin). Complete Blood Count (CBC) tests are less costly and more widely available. Machine learning models have not yet been applied to discriminating IDA but do well for [...] Read more.

Background: Currently, discriminating Iron Deficiency Anemia (IDA) from other anemia requires an expensive test (serum ferritin). Complete Blood Count (CBC) tests are less costly and more widely available. Machine learning models have not yet been applied to discriminating IDA but do well for similar tasks. Methods: We constructed multiple machine learning methods to classify IDA from CBC data using a US NHANES dataset of over 19,000 instances, calculating accuracy, precision, recall, and precision AUC (PR AUC). We validated the results using an unseen dataset from Kenya, using the same model. We calculated ranked feature importance to explain the global behavior of the model. Results: Our model classifies IDA with a PR AUC of 0.87 and recall/sensitivity of 0.98 and 0.89 for the original dataset and an unseen Kenya dataset, respectively. The explanations indicate that low blood level of hemoglobin, higher age, and higher Red Blood Cell distribution width were most critical. We also found that optimization made only minor changes to the explanations and that the features used remained consistent with professional practice. Conclusions: The overall high performance and consistency of the results suggest that the approach would be acceptable to health professionals and would support enhancements to current automated CBC analyzers. Full article

(This article belongs to the Special Issue Editor's Choice Series for the Applied Biomedical Data Science Section)

► Show Figures

Graphical abstract

23 pages, 8260 KB

Open AccessArticle

Naturalize Revolution: Unprecedented AI-Driven Precision in Skin Cancer Classification Using Deep Learning

by Mohamad Abou Ali, Fadi Dornaika, Ignacio Arganda-Carreras, Hussein Ali and Malak Karaouni

BioMedInformatics 2024, 4(1), 638-660; https://doi.org/10.3390/biomedinformatics4010035 - 1 Mar 2024

Cited by 13 | Viewed by 3849

Abstract

Background: In response to the escalating global concerns surrounding skin cancer, this study aims to address the imperative for precise and efficient diagnostic methodologies. Focusing on the intricate task of eight-class skin cancer classification, the research delves into the limitations of conventional diagnostic [...] Read more.

Background: In response to the escalating global concerns surrounding skin cancer, this study aims to address the imperative for precise and efficient diagnostic methodologies. Focusing on the intricate task of eight-class skin cancer classification, the research delves into the limitations of conventional diagnostic approaches, often hindered by subjectivity and resource constraints. The transformative potential of Artificial Intelligence (AI) in revolutionizing diagnostic paradigms is underscored, emphasizing significant improvements in accuracy and accessibility. Methods: Utilizing cutting-edge deep learning models on the ISIC2019 dataset, a comprehensive analysis is conducted, employing a diverse array of pre-trained ImageNet architectures and Vision Transformer models. To counteract the inherent class imbalance in skin cancer datasets, a pioneering “Naturalize” augmentation technique is introduced. This technique leads to the creation of two indispensable datasets—the Naturalized 2.4K ISIC2019 and groundbreaking Naturalized 7.2K ISIC2019 datasets—catalyzing advancements in classification accuracy. The “Naturalize” augmentation technique involves the segmentation of skin cancer images using the Segment Anything Model (SAM) and the systematic addition of segmented cancer images to a background image to generate new composite images. Results: The research showcases the pivotal role of AI in mitigating the risks of misdiagnosis and under-diagnosis in skin cancer. The proficiency of AI in analyzing vast datasets and discerning subtle patterns significantly augments the diagnostic prowess of dermatologists. Quantitative measures such as confusion matrices, classification reports, and visual analyses using Score-CAM across diverse dataset variations are meticulously evaluated. The culmination of these endeavors resulted in an unprecedented achievement—100% average accuracy, precision, recall, and F1-score—within the groundbreaking Naturalized 7.2K ISIC2019 dataset. Conclusion: This groundbreaking exploration highlights the transformative capabilities of AI-driven methodologies in reshaping the landscape of skin cancer diagnosis and patient care. The research represents a pivotal stride towards redefining dermatological diagnosis, showcasing the remarkable impact of AI-powered solutions in surmounting the challenges inherent in skin cancer diagnosis. The attainment of 100% across crucial metrics within the Naturalized 7.2K ISIC2019 dataset serves as a testament to the transformative capabilities of AI-driven approaches in reshaping the trajectory of skin cancer diagnosis and patient care. This pioneering work paves the way for a new era in dermatological diagnostics, heralding the dawn of unprecedented precision and efficacy in the identification and classification of skin cancers. Full article

(This article belongs to the Special Issue Computational Biology and Artificial Intelligence in Medicine)

► Show Figures

Graphical abstract

15 pages, 3229 KB

Open AccessArticle

Real-Time Jaundice Detection in Neonates Based on Machine Learning Models

by Ahmad Yaseen Abdulrazzak, Saleem Latif Mohammed, Ali Al-Naji and Javaan Chahl

BioMedInformatics 2024, 4(1), 623-637; https://doi.org/10.3390/biomedinformatics4010034 - 24 Feb 2024

Cited by 11 | Viewed by 6311

Abstract

Introduction: Despite the many attempts made by researchers to diagnose jaundice non-invasively using machine learning techniques, the low amount of data used to build their models remains the key factor limiting the performance of their models. Objective: To build a system to diagnose [...] Read more.

Introduction: Despite the many attempts made by researchers to diagnose jaundice non-invasively using machine learning techniques, the low amount of data used to build their models remains the key factor limiting the performance of their models. Objective: To build a system to diagnose neonatal jaundice non-invasively based on machine learning algorithms created based on a dataset comprising 767 infant images using a computer device and a USB webcam. Methods: The first stage of the proposed system was to evaluate the performance of four machine learning algorithms, namely support vector machine (SVM), k nearest neighbor (k-NN), random forest (RF), and extreme gradient boost (XGBoost), based on a dataset of 767 infant images. The algorithm with the best performance was chosen as the classifying algorithm in the developed application. The second stage included designing an application that enables the user to perform jaundice detection for a patient under test with the minimum effort required by capturing the patient’s image using a USB webcam. Results: The obtained results of the first stage of the machine learning algorithms evaluation process indicated that XGBoost outperformed the rest of the algorithms by obtaining an accuracy of 99.63%. The second-best algorithm was the RF algorithm, which had an accuracy of 98.99%. Following RF, with a slight difference, was the k-NN algorithm. It achieved an accuracy of 98.25%. SVM scored the lowest performance among the above three algorithms, with an accuracy of 96.22%. Based on these obtained results, the XGBoost algorithm was chosen to be the classifier of the proposed system. In the second stage, the jaundice application was designed based on the model created by the XGBoost algorithm. This application ensured it was user friendly with as fast a processing time as possible. Conclusion: Early detection of neonatal jaundice is crucial due to the severity of its complications. A non-invasive system using a USB webcam and an XGBoost machine learning technique was proposed. The XGBoost algorithm achieved 99.63% accuracy and successfully diagnosed 10 out of 10 NICU infants with very little processing time. This denotes the efficiency of machine learning algorithms in healthcare in general and in monitoring systems specifically. Full article

(This article belongs to the Special Issue Feature Papers in Applied Biomedical Data Science)

► Show Figures

Figure 1

23 pages, 4265 KB

Open AccessArticle

Machine Learning Approach to Identify Case-Control Studies on ApoE Gene Mutations Linked to Alzheimer’s Disease in Italy

by Giorgia Francesca Saraceno, Diana Marisol Abrego-Guandique, Roberto Cannataro, Maria Cristina Caroleo and Erika Cione

BioMedInformatics 2024, 4(1), 600-622; https://doi.org/10.3390/biomedinformatics4010033 - 23 Feb 2024

Cited by 8 | Viewed by 3558

Abstract

Background: An application of artificial intelligence is machine learning, which allows computer programs to learn and create data. Methods: In this work, we aimed to evaluate the performance of the MySLR machine learning platform, which implements the Latent Dirichlet Allocation (LDA) algorithm in [...] Read more.

Background: An application of artificial intelligence is machine learning, which allows computer programs to learn and create data. Methods: In this work, we aimed to evaluate the performance of the MySLR machine learning platform, which implements the Latent Dirichlet Allocation (LDA) algorithm in the identification and screening of papers present in the literature that focus on mutations of the apolipoprotein E (ApoE) gene in Italian Alzheimer’s Disease patients. Results: MySLR excludes duplicates and creates topics. MySLR was applied to analyze a set of 164 scientific publications. After duplicate removal, the results allowed us to identify 92 papers divided into two relevant topics characterizing the investigated research area. Topic 1 contains 70 papers, and topic 2 contains the remaining 22. Despite the current limitations, the available evidence suggests that articles containing studies on Italian Alzheimer’s Disease (AD) patients were 65.22% (n = 60). Furthermore, the presence of papers about mutations, including single nucleotide polymorphisms (SNPs) ApoE gene, the primary genetic risk factor of AD, for the Italian population was 5.4% (n = 5). Conclusion: The results show that the machine learning platform helped to identify case-control studies on ApoE gene mutations, including SNPs, but not only conducted in Italy. Full article

(This article belongs to the Special Issue Feature Papers in Computational Biology and Medicine)

► Show Figures

Graphical abstract

34 pages, 3584 KB

Open AccessReview

Development and Practical Applications of Computational Intelligence Technology

by Yasunari Matsuzaka and Ryu Yashiro

BioMedInformatics 2024, 4(1), 566-599; https://doi.org/10.3390/biomedinformatics4010032 - 22 Feb 2024

Cited by 4 | Viewed by 2127

Abstract

Computational intelligence (CI) uses applied computational methods for problem-solving inspired by the behavior of humans and animals. Biological systems are used to construct software to solve complex problems, and one type of such system is an artificial immune system (AIS), which imitates the [...] Read more.

Computational intelligence (CI) uses applied computational methods for problem-solving inspired by the behavior of humans and animals. Biological systems are used to construct software to solve complex problems, and one type of such system is an artificial immune system (AIS), which imitates the immune system of a living body. AISs have been used to solve problems that require identification and learning, such as computer virus identification and removal, image identification, and function optimization problems. In the body’s immune system, a wide variety of cells work together to distinguish between the self and non-self and to eliminate the non-self. AISs enable learning and discrimination by imitating part or all of the mechanisms of a living body’s immune system. Certainly, some deep neural networks have exceptional performance that far surpasses that of humans in certain tasks, but to build such a network, a huge amount of data is first required. These networks are used in a wide range of applications, such as extracting knowledge from a large amount of data, learning from past actions, and creating the optimal solution (the optimization problem). A new technique for pre-training natural language processing (NLP) software ver.9.1by using transformers called Bidirectional Encoder Representations (BERT) builds on recent research in pre-training contextual representations, including Semi-Supervised Sequence Learning, Generative Pre-Training, ELMo (Embeddings from Language Models), which is a method for obtaining distributed representations that consider context, and ULMFit (Universal Language Model Fine-Tuning). BERT is a method that can address the issue of the need for large amounts of data, which is inherent in large-scale models, by using pre-learning with unlabeled data. An optimization problem involves “finding a solution that maximizes or minimizes an objective function under given constraints”. In recent years, machine learning approaches that consider pattern recognition as an optimization problem have become popular. This pattern recognition is an operation that associates patterns observed as spatial and temporal changes in signals with classes to which they belong. It involves identifying and retrieving predetermined features and rules from data; however, the features and rules here are not logical information, but are found in images, sounds, etc. Therefore, pattern recognition is generally conducted by supervised learning. Based on a new theory that deals with the process by which the immune system learns from past infection experiences, the clonal selection of immune cells can be viewed as a learning rule of reinforcement learning. Full article

(This article belongs to the Special Issue Computational Biology and Artificial Intelligence in Medicine)

► Show Figures

Figure 1

17 pages, 1859 KB

Open AccessArticle

Assessment of Voice Disorders Using Machine Learning and Vocal Analysis of Voice Samples Recorded through Smartphones

by Michele Giuseppe Di Cesare, David Perpetuini, Daniela Cardone and Arcangelo Merla

BioMedInformatics 2024, 4(1), 549-565; https://doi.org/10.3390/biomedinformatics4010031 - 19 Feb 2024

Cited by 13 | Viewed by 5274

Abstract

Background: The integration of edge computing into smart healthcare systems requires the development of computationally efficient models and methodologies for monitoring and detecting patients’ healthcare statuses. In this context, mobile devices, such as smartphones, are increasingly employed for the purpose of aiding diagnosis, [...] Read more.

Background: The integration of edge computing into smart healthcare systems requires the development of computationally efficient models and methodologies for monitoring and detecting patients’ healthcare statuses. In this context, mobile devices, such as smartphones, are increasingly employed for the purpose of aiding diagnosis, treatment, and monitoring. Notably, smartphones are widely pervasive and readily accessible to a significant portion of the population. These devices empower individuals to conveniently record and submit voice samples, thereby potentially facilitating the early detection of vocal irregularities or changes. This research focuses on the creation of diverse machine learning frameworks based on vocal samples captured by smartphones to distinguish between pathological and healthy voices. Methods: The investigation leverages the publicly available VOICED dataset, comprising 58 healthy voice samples and 150 samples from voices exhibiting pathological conditions, and machine learning techniques for the classification of healthy and diseased patients through the employment of Mel-frequency cepstral coefficients. Results: Through cross-validated two-class classification, the fine k-nearest neighbor exhibited the highest performance, achieving an accuracy rate of 98.3% in identifying healthy and pathological voices. Conclusions: This study holds promise for enabling smartphones to effectively identify vocal disorders, offering a multitude of advantages for both individuals and healthcare systems, encompassing heightened accessibility, early detection, and continuous monitoring. Full article

(This article belongs to the Special Issue Editor's Choice Series for the Applied Biomedical Data Science Section)

► Show Figures

Graphical abstract

7 pages, 983 KB

Open AccessEditorial

From Code to Cure: The Impact of Artificial Intelligence in Biomedical Applications

by M. Michael Gromiha, Palanisamy Preethi and Medha Pandey

BioMedInformatics 2024, 4(1), 542-548; https://doi.org/10.3390/biomedinformatics4010030 - 18 Feb 2024

Cited by 2 | Viewed by 2801

Abstract

Artificial intelligence (AI), a branch of computer science, involves developing intelligent computer programs to mimic human intelligence and automate various processes [...] Full article

► Show Figures

Figure 1

23 pages, 2217 KB

Open AccessArticle

Deep Learning-Based Detection of Learning Disorders on a Large Scale Dataset of Eye Movement Records

by Alae Eddine El Hmimdi, Zoï Kapoula and Vivien Sainte Fare Garnot

BioMedInformatics 2024, 4(1), 519-541; https://doi.org/10.3390/biomedinformatics4010029 - 14 Feb 2024

Cited by 13 | Viewed by 3830

Abstract

Early detection of dyslexia and learning disorders is vital for avoiding a learning disability, as well as supporting dyslexic students by tailoring academic programs to their needs. Several studies have investigated using supervised algorithms to screen dyslexia vs. control subjects; however, the data [...] Read more.

Early detection of dyslexia and learning disorders is vital for avoiding a learning disability, as well as supporting dyslexic students by tailoring academic programs to their needs. Several studies have investigated using supervised algorithms to screen dyslexia vs. control subjects; however, the data size and the conditions of data acquisition were their most significant limitation. In the current study, we leverage a large dataset, containing 4243 time series of eye movement records from children across Europe. These datasets were derived from various tests such as saccade, vergence, and reading tasks. Furthermore, our methods were evaluated with realistic test data, including real-life biases such as noise, eye tracking misalignment, and similar pathologies among non-scholar difficulty classes. In addition, we present a novel convolutional neural network architecture, adapted to our time series classification problem, that is intended to generalize on a small annotated dataset and to handle a high-resolution signal (1024 point). Our architecture achieved a precision of 80.20% and a recall of 75.1%, when trained on the vergence dataset, and a precision of 77.2% and a recall of 77.5% when trained on the saccade dataset. Finally, we performed a comparison using our ML approach, a second architecture developed for a similar problem, and two other methods that we investigated that use deep learning algorithms to predict dyslexia. Full article

(This article belongs to the Special Issue Deep Learning Methods and Application for Bioinformatics and Healthcare)

► Show Figures

Figure 1

30 pages, 29526 KB

Open AccessArticle

Whole Slide Image Understanding in Pathology: What Is the Salient Scale of Analysis?

by Eleanor Jenkinson and Ognjen Arandjelović

BioMedInformatics 2024, 4(1), 489-518; https://doi.org/10.3390/biomedinformatics4010028 - 14 Feb 2024

Cited by 5 | Viewed by 4738

Abstract

Background: In recent years, there has been increasing research in the applications of Artificial Intelligence in the medical industry. Digital pathology has seen great success in introducing the use of technology in the digitisation and analysis of pathology slides to ease the burden [...] Read more.

Background: In recent years, there has been increasing research in the applications of Artificial Intelligence in the medical industry. Digital pathology has seen great success in introducing the use of technology in the digitisation and analysis of pathology slides to ease the burden of work on pathologists. Digitised pathology slides, otherwise known as whole slide images, can be analysed by pathologists with the same methods used to analyse traditional glass slides. Methods: The digitisation of pathology slides has also led to the possibility of using these whole slide images to train machine learning models to detect tumours. Patch-based methods are common in the analysis of whole slide images as these images are too large to be processed using normal machine learning methods. However, there is little work exploring the effect that the size of the patches has on the analysis. A patch-based whole slide image analysis method was implemented and then used to evaluate and compare the accuracy of the analysis using patches of different sizes. In addition, two different patch sampling methods are used to test if the optimal patch size is the same for both methods, as well as a downsampling method where whole slide images of low resolution images are used to train an analysis model. Results: It was discovered that the most successful method uses a patch size of 256 × 256 pixels with the informed sampling method, using the location of tumour regions to sample a balanced dataset. Conclusion: Future work on batch-based analysis of whole slide images in pathology should take into account our findings when designing new models. Full article

(This article belongs to the Special Issue Feature Papers in Medical Statistics and Data Science Section)

► Show Figures

Figure 1

12 pages, 594 KB

Open AccessArticle

Weighted Rank Difference Ensemble: A New Form of Ensemble Feature Selection Method for Medical Datasets

by Arju Manara Begum, M. Rubaiyat Hossain Mondal, Prajoy Podder and Joarder Kamruzzaman

BioMedInformatics 2024, 4(1), 477-488; https://doi.org/10.3390/biomedinformatics4010027 - 10 Feb 2024

Cited by 5 | Viewed by 2311

Abstract

Background: Feature selection (FS), a crucial preprocessing step in machine learning, greatly reduces the dimension of data and improves model performance. This paper focuses on selecting features for medical data classification. Methods: In this work, a new form of ensemble FS method called [...] Read more.

Background: Feature selection (FS), a crucial preprocessing step in machine learning, greatly reduces the dimension of data and improves model performance. This paper focuses on selecting features for medical data classification. Methods: In this work, a new form of ensemble FS method called weighted rank difference ensemble (WRD-Ensemble) has been put forth. It combines three FS methods to produce a stable and diverse subset of features. The three base FS approaches are Pearson’s correlation coefficient (PCC), reliefF, and gain ratio (GR). These three FS approaches produce three distinct lists of features, and then they order each feature by importance or weight. The final subset of features in this study is chosen using the average weight of each feature and the rank difference of a feature across three ranked lists. Using the average weight and rank difference of each feature, unstable and less significant features are eliminated from the feature space. The WRD-Ensemble method is applied to three medical datasets: chronic kidney disease (CKD), lung cancer, and heart disease. These data samples are classified using logistic regression (LR). Results: The experimental results show that compared to the base FS methods and other ensemble FS methods, the proposed WRD-Ensemble method leads to obtaining the highest accuracy value of 98.97% for CKD, 93.24% for lung cancer, and 83.84% for heart disease. Conclusion: The results indicate that the proposed WRD-Ensemble method can potentially improve the accuracy of disease diagnosis models, contributing to advances in clinical decision-making. Full article

(This article belongs to the Special Issue Feature Papers in Clinical Informatics Section)

► Show Figures

Figure 1

23 pages, 1754 KB

Open AccessArticle

An Interactive Dashboard for Statistical Analysis of Intensive Care Unit COVID-19 Data

by Rúben Dias, Artur Ferreira, Iola Pinto, Carlos Geraldes, Cristiana Von Rekowski and Luís Bento

BioMedInformatics 2024, 4(1), 454-476; https://doi.org/10.3390/biomedinformatics4010026 - 7 Feb 2024

Viewed by 2918

Abstract

Background: COVID-19 caused a pandemic, due to its ease of transmission and high number of infections. The evolution of the pandemic and its consequences for the mortality and morbidity of populations, especially the elderly, generated several scientific studies and many research projects. Among [...] Read more.

Background: COVID-19 caused a pandemic, due to its ease of transmission and high number of infections. The evolution of the pandemic and its consequences for the mortality and morbidity of populations, especially the elderly, generated several scientific studies and many research projects. Among them, we have the Predictive Models of COVID-19 Outcomes for Higher Risk Patients Towards a Precision Medicine (PREMO) research project. For such a project with many data records, it is necessary to provide a smooth graphical analysis to extract value from it. Methods: In this paper, we present the development of a full-stack Web application for the PREMO project, consisting of a dashboard providing statistical analysis, data visualization, data import, and data export. The main aspects of the application are described, as well as the diverse types of graphical representations and the possibility to use filters to extract relevant information for clinical practice. Results: The application, accessible through a browser, provides an interactive visualization of data from patients admitted to the intensive care unit (ICU), throughout the six waves of COVID-19 in two hospitals in Lisbon, Portugal. The analysis can be isolated per wave or can be seen in an aggregated view, allowing clinicians to create many views of the data and to study the behavior and consequences of different waves. For instance, the experimental results show clearly the effect of vaccination as well as the changes on the most relevant clinical parameters on each wave. Conclusions: The dashboard allows clinicians to analyze many variables of each of the six waves as well as aggregated data for all the waves. The application allows the user to extract information and scientific knowledge about COVID-19’s evolution, yielding insights for this pandemic and for future pandemics. Full article

(This article belongs to the Section Applied Biomedical Data Science)

► Show Figures

Figure 1

17 pages, 1062 KB

Open AccessArticle

Non-Contact Blood Pressure Estimation Using Forehead and Palm Infrared Video

by Thomas Stogiannopoulos and Nikolaos Mitianoudis

BioMedInformatics 2024, 4(1), 437-453; https://doi.org/10.3390/biomedinformatics4010025 - 7 Feb 2024

Cited by 2 | Viewed by 3220

Abstract

This study investigates the potential of low-cost infrared cameras for non-contact monitoring of blood pressure (BP) in individuals with fragile health, particularly the elderly. Previous research has shown success in developing non-contact BP monitoring using RGB cameras. In this study, the Eulerian Video [...] Read more.

This study investigates the potential of low-cost infrared cameras for non-contact monitoring of blood pressure (BP) in individuals with fragile health, particularly the elderly. Previous research has shown success in developing non-contact BP monitoring using RGB cameras. In this study, the Eulerian Video Magnification (EVM) technique is employed to enhance minor variations in skin pixel intensity in specific facial regions captured by an infrared camera from the forehead and palm. The primary focus of this study is to explore the possibility of using infrared cameras for non-contact BP monitoring under low-light or night-time conditions. We have successfully shown that by employing a series of straightforward signal processing techniques and regression analysis, we were able to achieve commendable outcomes in our experimental setup. Specifically, we were able to surpass the stringent accuracy standards set forth by the British Hypertension Society (BHS) and the Association for the Advancement of Medical Instrumentation (AAMI) protocol. Full article

(This article belongs to the Special Issue Feature Papers in Applied Biomedical Data Science)

► Show Figures

Figure 1

14 pages, 1526 KB

Open AccessArticle

Ensemble Methods to Optimize Automated Text Classification in Avatar Therapy

by Alexandre Hudon, Kingsada Phraxayavong, Stéphane Potvin and Alexandre Dumais

BioMedInformatics 2024, 4(1), 423-436; https://doi.org/10.3390/biomedinformatics4010024 - 7 Feb 2024

Cited by 2 | Viewed by 3015

Abstract

Background: Psychotherapeutic approaches such as Avatar Therapy (AT) are novel therapeutic attempts to help patients diagnosed with treatment-resistant schizophrenia. Qualitative analyses of immersive sessions of AT have been undertaken to enhance and refine the existing interventions taking place in this therapy. To account [...] Read more.

Background: Psychotherapeutic approaches such as Avatar Therapy (AT) are novel therapeutic attempts to help patients diagnosed with treatment-resistant schizophrenia. Qualitative analyses of immersive sessions of AT have been undertaken to enhance and refine the existing interventions taking place in this therapy. To account for the time-consuming and costly nature and potential misclassification biases, prior implementation of a Linear Support Vector Classifier provided helpful insight. Single model implementation for text classification is often limited, especially for datasets containing imbalanced data. The main objective of this study is to evaluate the change in accuracy of automated text classification machine learning algorithms when using an ensemble approach for immersive session verbatims of AT. Methods: An ensemble model, comprising five machine learning algorithms, was implemented to conduct text classification for avatar and patient interactions. The models included in this study are: Multinomial Naïve Bayes, Linear Support Vector Classifier, Multi-layer perceptron classifier, XGBClassifier and the K-Nearest-Neighbor model. Accuracy, precision, recall and f1-score were compared for the individual classifiers and the ensemble model. Results: The ensemble model performed better than its individual counterparts for accuracy. Conclusion: Using an ensemble methodological approach, this methodology might be employed in future research to provide insight into the interactions being categorized and the therapeutical outcome of patients based on their experience with AT with optimal precision. Full article

(This article belongs to the Special Issue Deep Learning Methods and Application for Bioinformatics and Healthcare)

► Show Figures

Figure 1

13 pages, 15764 KB

Open AccessArticle

Lip-Reading Advancements: A 3D Convolutional Neural Network/Long Short-Term Memory Fusion for Precise Word Recognition

by Themis Exarchos, Georgios N. Dimitrakopoulos, Aristidis G. Vrahatis, Georgios Chrysovitsiotis, Zoi Zachou and Efthymios Kyrodimos

BioMedInformatics 2024, 4(1), 410-422; https://doi.org/10.3390/biomedinformatics4010023 - 4 Feb 2024

Cited by 11 | Viewed by 5766

Abstract

Lip reading, the art of deciphering spoken words from the visual cues of lip movements, has garnered significant interest for its potential applications in diverse fields, including assistive technologies, human–computer interaction, and security systems. With the rapid advancements in technology and the increasing [...] Read more.

Lip reading, the art of deciphering spoken words from the visual cues of lip movements, has garnered significant interest for its potential applications in diverse fields, including assistive technologies, human–computer interaction, and security systems. With the rapid advancements in technology and the increasing emphasis on non-verbal communication methods, the significance of lip reading has expanded beyond its traditional boundaries. These technological advancements have led to the generation of large-scale and complex datasets, necessitating the use of cutting-edge deep learning tools that are adept at handling such intricacies. In this study, we propose an innovative approach combining 3D Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks to tackle the challenging task of word recognition from lip movements. Our research leverages a meticulously curated dataset, named MobLip, encompassing various speech patterns, speakers, and environmental conditions. The synergy between the spatial information extracted by 3D CNNs and the temporal dynamics captured by LSTMs yields impressive results, achieving an accuracy rate of up to 87.5%, showcasing robustness to lighting variations and speaker diversity. Comparative experiments demonstrate our model’s superiority over existing lip-reading approaches, underlining its potential for real-world deployment. Furthermore, we discuss ethical considerations and propose avenues for future research, such as multimodal integration with audio data and expanded language support. In conclusion, our 3D CNN-LSTM architecture presents a promising solution to the complex problem of word recognition from lip movements, contributing to the advancement of communication technology and opening doors to innovative applications in an increasingly visual world. Full article

(This article belongs to the Special Issue Feature Papers on Methods in Biomedical Informatics)

► Show Figures

Figure 1

25 pages, 3126 KB

Open AccessReview

Unravelling Insights into the Evolution and Management of SARS-CoV-2

by Aganze Gloire-Aimé Mushebenge, Samuel Chima Ugbaja, Nonkululeko Avril Mbatha, Rene B. Khan and Hezekiel M. Kumalo

BioMedInformatics 2024, 4(1), 385-409; https://doi.org/10.3390/biomedinformatics4010022 - 3 Feb 2024

Cited by 2 | Viewed by 3209

Abstract

Worldwide, the COVID-19 pandemic, caused by the brand-new coronavirus SARS-CoV-2, has claimed a sizable number of lives. The virus’ rapid spread and impact on every facet of human existence necessitate a continuous and dynamic examination of its biology and management. Despite this urgency, [...] Read more.

Worldwide, the COVID-19 pandemic, caused by the brand-new coronavirus SARS-CoV-2, has claimed a sizable number of lives. The virus’ rapid spread and impact on every facet of human existence necessitate a continuous and dynamic examination of its biology and management. Despite this urgency, COVID-19 does not currently have any particular antiviral treatments. As a result, scientists are concentrating on repurposing existing antiviral medications or creating brand-new ones. This comprehensive review seeks to provide an in-depth exploration of our current understanding of SARS-CoV-2, starting with an analysis of its prevalence, pathology, and evolutionary trends. In doing so, the review aims to clarify the complex network of factors that have contributed to the varying case fatality rates observed in different geographic areas. In this work, we explore the complex world of SARS-CoV-2 mutations and their implications for vaccine efficacy and therapeutic interventions. The dynamic viral landscape of the pandemic poses a significant challenge, leading scientists to investigate the genetic foundations of the virus and the mechanisms underlying these genetic alterations. Numerous hypotheses have been proposed as the pandemic has developed, covering various subjects like the selection pressures driving mutation, the possibility of vaccine escape, and the consequences for clinical therapy. Furthermore, this review will shed light on current clinical trials investigating novel medicines and vaccine development, including the promising field of drug repurposing, providing a window into the changing field of treatment approaches. This study provides a comprehensive understanding of the virus by compiling the huge and evolving body of knowledge on SARS-CoV-2, highlighting its complexities and implications for public health, and igniting additional investigation into the control of this unprecedented global health disaster. Full article

(This article belongs to the Special Issue Features of Bioinformatic Analyses for SARS-CoV-2 Infections and Vaccination)

► Show Figures

Figure 1

25 pages, 3729 KB

Open AccessArticle

Toward Cancer Chemoprevention: Mathematical Modeling of Chemically Induced Carcinogenesis and Chemoprevention

by Dimitrios G. Boucharas, Chryssa Anastasiadou, Spyridon Karkabounas, Efthimia Antonopoulou and George Manis

BioMedInformatics 2024, 4(1), 360-384; https://doi.org/10.3390/biomedinformatics4010021 - 2 Feb 2024

Cited by 3 | Viewed by 2028

Abstract

Cancer, which is currently rated as the second-leading cause of mortality across the globe, is one of the most hazardous disease groups that has plagued humanity for centuries. The experiments presented here span over two decades and were conducted on a specific species [...] Read more.

Cancer, which is currently rated as the second-leading cause of mortality across the globe, is one of the most hazardous disease groups that has plagued humanity for centuries. The experiments presented here span over two decades and were conducted on a specific species of mice, aiming to neutralize a highly carcinogenic agent by altering its chemical structure when combined with certain compounds. A plethora of growth models, each of which makes use of distinctive qualities, are utilized in the investigation and explanation of the phenomena of chemically induced oncogenesis and prevention. The analysis ultimately results in the formalization of the process of locating the growth model that provides the best descriptive power based on predefined criteria. This is accomplished through a methodological workflow that adopts a computational pipeline based on the Levenberg–Marquardt algorithm with pioneering and conventional metrics as well as a ruleset. The developed process simplifies the investigated phenomena as the parameter space of growth models is reduced. The predictability is proven strong in the near future (i.e., a 0.61% difference between the predicted and actual values). The parameters differentiate between active compounds (i.e., classification results reach up to 96% in sensitivity and other performance metrics). The distribution of parameter contribution complements the findings that the logistic growth model is the most appropriate (i.e., 44.47%). In addition, the dosage of chemicals is increased by a factor of two for the next round of trials, which exposes parallel behavior between the two dosages. As a consequence, the study reveals important information on chemoprevention and the cycles of cancer proliferation. If developed further, it might lead to the development of nutritional supplements that completely inhibit the expansion of cancerous tumors. The methodology provided can be used to describe other phenomena that progress over time and it has the power to estimate future results. Full article

(This article belongs to the Special Issue Computational Biology and Artificial Intelligence in Medicine)

► Show Figures

Graphical abstract

13 pages, 1028 KB

Open AccessArticle

Identifying Potent Fat Mass and Obesity-Associated Protein Inhibitors Using Deep Learning-Based Hybrid Procedures

by Kannan Mayuri, Durairaj Varalakshmi, Mayakrishnan Tharaheswari, Chaitanya Sree Somala, Selvaraj Sathya Priya, Nagaraj Bharathkumar, Renganathan Senthil, Raja Babu Singh Kushwah, Sundaram Vickram, Thirunavukarasou Anand and Konda Mani Saravanan

BioMedInformatics 2024, 4(1), 347-359; https://doi.org/10.3390/biomedinformatics4010020 - 1 Feb 2024

Cited by 30 | Viewed by 3118

Abstract

The fat mass and obesity-associated (FTO) protein catalyzes metal-dependent modifications of nucleic acids, namely the demethylation of methyl adenosine inside mRNA molecules. The FTO protein has been identified as a potential target for developing anticancer therapies. Identifying a suitable ligand-targeting FTO protein is [...] Read more.

The fat mass and obesity-associated (FTO) protein catalyzes metal-dependent modifications of nucleic acids, namely the demethylation of methyl adenosine inside mRNA molecules. The FTO protein has been identified as a potential target for developing anticancer therapies. Identifying a suitable ligand-targeting FTO protein is crucial to developing chemotherapeutic medicines to combat obesity and cancer. Scientists worldwide have employed many methodologies to discover a potent inhibitor for the FTO protein. This study uses deep learning-based methods and molecular docking techniques to investigate the FTO protein as a target. Our strategy involves systematically screening a database of small chemical compounds. By utilizing the crystal structures of the FTO complexed with ligands, we successfully identified three small-molecule chemical compounds (ZINC000003643476, ZINC000000517415, and ZINC000001562130) as inhibitors of the FTO protein. The identification process was accomplished by employing a combination of screening techniques, specifically deep learning (DeepBindGCN) and Autodock vina, on the ZINC database. These compounds were subjected to comprehensive analysis using 100 nanoseconds of molecular dynamics and binding free energy calculations. The findings of our study indicate the identification of three candidate inhibitors that might effectively target the human fat mass and obesity protein. The results of this study have the potential to facilitate the exploration of other chemicals that can interact with FTO. Conducting biochemical studies to evaluate these compounds’ effectiveness may contribute to improving fat mass and obesity treatment strategies. Full article

(This article belongs to the Topic Machine Learning Empowered Drug Screen)

► Show Figures

Figure 1

21 pages, 3029 KB

Open AccessArticle

Depleted-MLH1 Expression Predicts Prognosis and Immunotherapeutic Efficacy in Uterine Corpus Endometrial Cancer: An In Silico Approach

by Tesfaye Wolde, Jing Huang, Peng Huang, Vijay Pandey and Peiwu Qin

BioMedInformatics 2024, 4(1), 326-346; https://doi.org/10.3390/biomedinformatics4010019 - 1 Feb 2024

Cited by 6 | Viewed by 3451

Abstract

Uterine corpus endometrial carcinoma (UCEC) poses significant clinical challenges due to its high incidence and poor prognosis, exacerbated by the lack of effective screening methods. The standard treatment for UCEC typically involves surgical intervention, with radiation and chemotherapy as potential adjuvant therapies. In [...] Read more.

Uterine corpus endometrial carcinoma (UCEC) poses significant clinical challenges due to its high incidence and poor prognosis, exacerbated by the lack of effective screening methods. The standard treatment for UCEC typically involves surgical intervention, with radiation and chemotherapy as potential adjuvant therapies. In recent years, immunotherapy has emerged as a promising avenue for the advanced treatment of UCEC. This study employs a multi-omics approach, analyzing RNA-sequencing data and clinical information from The Cancer Genome Atlas (TCGA), Gene Expression Profiling Interactive Analysis (GEPIA), and GeneMANIA databases to investigate the prognostic value of MutL Homolog 1 (MLH1) gene expression in UCEC. The dysregulation of MLH1 in UCEC is linked to adverse prognostic outcomes and suppressed immune cell infiltration. Gene Set Enrichment Analysis (GSEA) data reveal MLH1’s involvement in immune-related processes, while its expression correlates with tumor mutational burden (TMB) and microsatellite instability (MSI). Lower MLH1 expression is associated with poorer prognosis, reduced responsiveness to Programmed cell death protein 1 (PD-1)/Programmed death-ligand 1 (PD-L1) inhibitors, and heightened sensitivity to anti-cancer agents. This comprehensive analysis establishes MLH1 as a potential biomarker for predicting prognosis, immunotherapy response, and drug sensitivity in UCEC, offering crucial insights for the clinical management of patients. Full article

(This article belongs to the Special Issue Feature Papers in Computational Biology and Medicine)

► Show Figures

Graphical abstract

Journal Menu

Journal Browser

BioMedInformatics, Volume 4, Issue 1 (March 2024) – 48 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI