Novel Approaches to Machine Learning and Artificial Intelligence in Cancer Research and Care

A special issue of Cancers (ISSN 2072-6694). This special issue belongs to the section "Cancer Informatics and Big Data".

Deadline for manuscript submissions: 31 January 2025 | Viewed by 11934

Special Issue Editors


E-Mail Website
Guest Editor
Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
Interests: brain tumors; CNS neoplasms; image-guided therapy; MRI biomarkers; clinical trials
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
1. Division of Internal Medicine, Section of Patient Centered Analytics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
2. MD Anderson Center for INSPiRED Cancer Care (Integrated Systems for Patient-Reported Data), The University of Texas MD Anderson Cancer Center, Houston, TX, USA
Interests: machine learning; statistical analysis; quantitative data analysis; research methodology; clinical trials

Special Issue Information

Dear Colleagues,

This Special Issue of Cancers will highlight cutting-edge research addressing new strategies to improve the performance and safe clinical translation of machine learning (ML) and artificial intelligence (AI) approaches to oncology research and care. Specifically, we aim to address four key elements of AI/ML development and implementation that will help cancer discovery to drive clinical impact: data, models, evaluation, and systems. Topics of interest in data may include new standards for data quality and provenance, and opportunities around the inclusion of metadata in model development and implementation; Topics of interest in models may include efficient and interpretable algorithms suitable for deployment in health systems; systems for deploying and monitoring algorithms to enable accelerated safe deployment across health systems. Regarding model evaluation, we welcome papers presenting new strategies for evaluating ML and AI through clinical trials or ensures the validity of text-generation tools. From a systems perspective, systems for deploying and monitoring algorithms in practice as well as approaches and standards for flagging model behaviors that may lead to unexpected outcomes are of interest.

Dr. Caroline Chung
Dr. Christopher Gibbons
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Cancers is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2900 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • deep learning
  • artificial intelligence
  • interpretable AI
  • data quality
  • metadata
  • data quality
  • cancer
  • MLOps
  • data science
  • real world data

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review, Other

37 pages, 2077 KiB  
Article
Enhancing Cancerous Gene Selection and Classification for High-Dimensional Microarray Data Using a Novel Hybrid Filter and Differential Evolutionary Feature Selection
by Arshad Hashmi, Waleed Ali, Anas Abulfaraj, Faisal Binzagr and Entisar Alkayal
Cancers 2024, 16(23), 3913; https://doi.org/10.3390/cancers16233913 - 22 Nov 2024
Viewed by 315
Abstract
Background: In recent years, microarray datasets have been used to store information about human genes and methods used to express the genes in order to successfully diagnose cancer disease in the early stages. However, most of the microarray datasets typically contain thousands of [...] Read more.
Background: In recent years, microarray datasets have been used to store information about human genes and methods used to express the genes in order to successfully diagnose cancer disease in the early stages. However, most of the microarray datasets typically contain thousands of redundant, irrelevant, and noisy genes, which raises a great challenge for effectively applying the machine learning algorithms to these high-dimensional microarray datasets. Methods: To address this challenge, this paper introduces a proposed hybrid filter and differential evolution-based feature selection to choose only the most influential genes or features of high-dimensional microarray datasets to improve cancer diagnoses and classification. The proposed approach is a two-phase hybrid feature selection model constructed using selecting the top-ranked features by some popular filter feature selection methods and then further identifying the most optimal features conducted by differential evolution (DE) optimization. Accordingly, some popular machine learning algorithms are trained using the final training microarray datasets with only the best features in order to produce outstanding cancer classification results. Four high-dimensional cancerous microarray datasets were used in this study to evaluate the proposed method, which are Breast, Lung, Central Nervous System (CNS), and Brain cancer datasets. Results: The experimental results demonstrate that the classification accuracy results achieved by the proposed hybrid filter-DE over filter methods increased to 100%, 100%, 93%, and 98% on Brain, CNS, Breast and Lung, respectively. Furthermore, applying the suggested DE-based feature selection contributed to removing around 50% of the features selected by using the filter methods for these four cancerous microarray datasets. The average improvement percentages of accuracy achieved by the proposed methods were up to 42.47%, 57.45%, 16.28% and 43.57% compared to the previous works that are 41.43%, 53.66%, 17.53%, 61.70% on Brain, CNS, Lung and Breast datasets, respectively. Conclusions: Compared to the previous works, the proposed methods accomplished better improvement percentages on Brain and CNS datasets, comparable improvement percentages on Lung dataset, and less improvement percentages on Breast dataset. Full article
Show Figures

Figure 1

9 pages, 1344 KiB  
Article
Artificial Intelligence and Colposcopy: Automatic Identification of Vaginal Squamous Cell Carcinoma Precursors
by Miguel Mascarenhas, Inês Alencoão, Maria João Carinhas, Miguel Martins, Tiago Ribeiro, Francisco Mendes, Pedro Cardoso, Maria João Almeida, Joana Mota, Joana Fernandes, João Ferreira, Guilherme Macedo, Teresa Mascarenhas and Rosa Zulmira
Cancers 2024, 16(20), 3540; https://doi.org/10.3390/cancers16203540 - 20 Oct 2024
Viewed by 723
Abstract
Background/Objectives: While human papillomavirus (HPV) is well known for its role in cervical cancer, it also affects vaginal cancers. Although colposcopy offers a comprehensive examination of the female genital tract, its diagnostic accuracy remains suboptimal. Integrating artificial intelligence (AI) could enhance the [...] Read more.
Background/Objectives: While human papillomavirus (HPV) is well known for its role in cervical cancer, it also affects vaginal cancers. Although colposcopy offers a comprehensive examination of the female genital tract, its diagnostic accuracy remains suboptimal. Integrating artificial intelligence (AI) could enhance the cost-effectiveness of colposcopy, but no AI models specifically differentiate low-grade (LSILs) and high-grade (HSILs) squamous intraepithelial lesions in the vagina. This study aims to develop and validate an AI model for the differentiation of HPV-associated dysplastic lesions in this region. Methods: A convolutional neural network (CNN) model was developed to differentiate HSILs from LSILs in vaginoscopy (during colposcopy) still images. The AI model was developed on a dataset of 57,250 frames (90% training/validation [including a 5-fold cross-validation] and 10% testing) obtained from 71 procedures. The model was evaluated based on its sensitivity, specificity, accuracy and area under the receiver operating curve (AUROC). Results: For HSIL/LSIL differentiation in the vagina, during the training/validation phase, the CNN demonstrated a mean sensitivity, specificity and accuracy of 98.7% (IC95% 96.7–100.0%), 99.1% (IC95% 98.1–100.0%), and 98.9% (IC95% 97.9–99.8%), respectively. The mean AUROC was 0.990 ± 0.004. During testing phase, the sensitivity was 99.6% and 99.7% for both specificity and accuracy. Conclusions: This is the first globally developed AI model capable of HSIL/LSIL differentiation in the vaginal region, demonstrating high and robust performance metrics. Its effective application paves the way for AI-powered colposcopic assessment across the entire female genital tract, offering a significant advancement in women’s healthcare worldwide. Full article
Show Figures

Figure 1

19 pages, 829 KiB  
Article
Learning from Imbalanced Data: Integration of Advanced Resampling Techniques and Machine Learning Models for Enhanced Cancer Diagnosis and Prognosis
by Fatih Gurcan and Ahmet Soylu
Cancers 2024, 16(19), 3417; https://doi.org/10.3390/cancers16193417 - 8 Oct 2024
Viewed by 975
Abstract
Background/Objectives: This study aims to evaluate the performance of various classification algorithms and resampling methods across multiple diagnostic and prognostic cancer datasets, addressing the challenges of class imbalance. Methods: A total of five datasets were analyzed, including three diagnostic datasets (Wisconsin Breast Cancer [...] Read more.
Background/Objectives: This study aims to evaluate the performance of various classification algorithms and resampling methods across multiple diagnostic and prognostic cancer datasets, addressing the challenges of class imbalance. Methods: A total of five datasets were analyzed, including three diagnostic datasets (Wisconsin Breast Cancer Database, Cancer Prediction Dataset, Lung Cancer Detection Dataset) and two prognostic datasets (Seer Breast Cancer Dataset, Differentiated Thyroid Cancer Recurrence Dataset). Nineteen resampling methods from three categories were employed, and ten classifiers from four distinct categories were utilized for comparison. Results: The results demonstrated that hybrid sampling methods, particularly SMOTEENN, achieved the highest mean performance at 98.19%, followed by IHT (97.20%) and RENN (96.48%). In terms of classifiers, Random Forest showed the best performance with a mean value of 94.69%, with Balanced Random Forest and XGBoost following closely. The baseline method (no resampling) yielded a significantly lower performance of 91.33%, highlighting the effectiveness of resampling techniques in improving model outcomes. Conclusions: This research underscores the importance of resampling methods in enhancing classification performance on imbalanced datasets, providing valuable insights for researchers and healthcare professionals. The findings serve as a foundation for future studies aimed at integrating machine learning techniques in cancer diagnosis and prognosis, with recommendations for further research on hybrid models and clinical applications. Full article
Show Figures

Figure 1

15 pages, 1346 KiB  
Article
Prediction of Mismatch Repair Status in Endometrial Cancer from Histological Slide Images Using Various Deep Learning-Based Algorithms
by Mina Umemoto, Tasuku Mariya, Yuta Nambu, Mai Nagata, Toshihiro Horimai, Shintaro Sugita, Takayuki Kanaseki, Yuka Takenaka, Shota Shinkai, Motoki Matsuura, Masahiro Iwasaki, Yoshihiko Hirohashi, Tadashi Hasegawa, Toshihiko Torigoe, Yuichi Fujino and Tsuyoshi Saito
Cancers 2024, 16(10), 1810; https://doi.org/10.3390/cancers16101810 - 9 May 2024
Viewed by 1330
Abstract
The application of deep learning algorithms to predict the molecular profiles of various cancers from digital images of hematoxylin and eosin (H&E)-stained slides has been reported in recent years, mainly for gastric and colon cancers. In this study, we investigated the potential use [...] Read more.
The application of deep learning algorithms to predict the molecular profiles of various cancers from digital images of hematoxylin and eosin (H&E)-stained slides has been reported in recent years, mainly for gastric and colon cancers. In this study, we investigated the potential use of H&E-stained endometrial cancer slide images to predict the associated mismatch repair (MMR) status. H&E-stained slide images were collected from 127 cases of the primary lesion of endometrial cancer. After digitization using a Nanozoomer virtual slide scanner (Hamamatsu Photonics), we segmented the scanned images into 5397 tiles of 512 × 512 pixels. The MMR proteins (PMS2, MSH6) were immunohistochemically stained, classified into MMR proficient/deficient, and annotated for each case and tile. We trained several neural networks, including convolutional and attention-based networks, using tiles annotated with the MMR status. Among the tested networks, ResNet50 exhibited the highest area under the receiver operating characteristic curve (AUROC) of 0.91 for predicting the MMR status. The constructed prediction algorithm may be applicable to other molecular profiles and useful for pre-screening before implementing other, more costly genetic profiling tests. Full article
Show Figures

Figure 1

13 pages, 4993 KiB  
Article
Predicting Immunotherapy Outcomes in Glioblastoma Patients through Machine Learning
by Guillaume Mestrallet
Cancers 2024, 16(2), 408; https://doi.org/10.3390/cancers16020408 - 18 Jan 2024
Cited by 2 | Viewed by 2000
Abstract
Glioblastoma is a highly aggressive cancer associated with a dismal prognosis, with a mere 5% of patients surviving beyond five years post diagnosis. Current therapeutic modalities encompass surgical intervention, radiotherapy, chemotherapy, and immune checkpoint inhibitors (ICBs). However, the efficacy of ICBs remains limited [...] Read more.
Glioblastoma is a highly aggressive cancer associated with a dismal prognosis, with a mere 5% of patients surviving beyond five years post diagnosis. Current therapeutic modalities encompass surgical intervention, radiotherapy, chemotherapy, and immune checkpoint inhibitors (ICBs). However, the efficacy of ICBs remains limited in glioblastoma patients, necessitating a proactive approach to anticipate treatment response and resistance. In this comprehensive study, we conducted a rigorous analysis involving two distinct glioblastoma patient cohorts subjected to PD-1 blockade treatments. Our investigation revealed that a significant portion (60%) of patients exhibit persistent disease progression despite ICB intervention. To elucidate the underpinnings of resistance, we characterized the immune profiles of glioblastoma patients with continued cancer progression following anti-PD1 therapy. These profiles revealed multifaceted defects, encompassing compromised macrophage, monocyte, and T follicular helper responses, impaired antigen presentation, aberrant regulatory T cell (Tregs) responses, and heightened expression of immunosuppressive molecules (TGFB, IL2RA, and CD276). Building upon these resistance profiles, we leveraged cutting-edge machine learning algorithms to develop predictive models and accompanying software. This innovative computational tool achieved remarkable success, accurately forecasting the progression status of 82.82% of the glioblastoma patients in our study following ICBs, based on their unique immune characteristics. In conclusion, our pioneering approach advocates for the personalization of immunotherapy in glioblastoma patients. By harnessing patient-specific attributes and computational predictions, we offer a promising avenue for the enhancement of clinical outcomes in the realm of immunotherapy. This paradigm shift towards tailored therapies underscores the potential to revolutionize the management of glioblastoma, opening new horizons for improved patient care. Full article
Show Figures

Figure 1

18 pages, 6583 KiB  
Article
Automated Laryngeal Cancer Detection and Classification Using Dwarf Mongoose Optimization Algorithm with Deep Learning
by Nuzaiha Mohamed, Reem Lafi Almutairi, Sayda Abdelrahim, Randa Alharbi, Fahad Mohammed Alhomayani, Bushra M. Elamin Elnaim, Azhari A. Elhag and Rajendra Dhakal
Cancers 2024, 16(1), 181; https://doi.org/10.3390/cancers16010181 - 29 Dec 2023
Cited by 4 | Viewed by 1895
Abstract
Laryngeal cancer (LCA) is a serious disease with a concerning global rise in incidence. Accurate treatment for LCA is particularly challenging in later stages, due to its complex nature as a head and neck malignancy. To address this challenge, researchers have been actively [...] Read more.
Laryngeal cancer (LCA) is a serious disease with a concerning global rise in incidence. Accurate treatment for LCA is particularly challenging in later stages, due to its complex nature as a head and neck malignancy. To address this challenge, researchers have been actively developing various analysis methods and tools to assist medical professionals in efficient LCA identification. However, existing tools and methods often suffer from various limitations, including low accuracy in early-stage LCA detection, high computational complexity, and lengthy patient screening times. With this motivation, this study presents an Automated Laryngeal Cancer Detection and Classification using a Dwarf Mongoose Optimization Algorithm with Deep Learning (ALCAD-DMODL) technique. The main objective of the ALCAD-DMODL method is to recognize the existence of LCA using the DL model. In the presented ALCAD-DMODL technique, a median filtering (MF)-based noise removal process takes place to get rid of the noise. Additionally, the ALCAD-DMODL technique involves the EfficientNet-B0 model for deriving feature vectors from the pre-processed images. For optimal hyperparameter tuning of the EfficientNet-B0 model, the DMO algorithm can be applied to select the parameters. Finally, the multi-head bidirectional gated recurrent unit (MBGRU) model is applied for the recognition and classification of LCA. The simulation result analysis of the ALCAD-DMODL technique is carried out on the throat region image dataset. The comparison study stated the supremacy of the ALCAD-DMODL technique in terms of distinct measures. Full article
Show Figures

Figure 1

Review

Jump to: Research, Other

19 pages, 744 KiB  
Review
Artificial Intelligence-Based Management of Adult Chronic Myeloid Leukemia: Where Are We and Where Are We Going?
by Simona Bernardi, Mauro Vallati and Roberto Gatta
Cancers 2024, 16(5), 848; https://doi.org/10.3390/cancers16050848 - 20 Feb 2024
Cited by 2 | Viewed by 1910
Abstract
Artificial intelligence (AI) is emerging as a discipline capable of providing significant added value in Medicine, in particular in radiomic, imaging analysis, big dataset analysis, and also for generating virtual cohort of patients. However, in coping with chronic myeloid leukemia (CML), considered an [...] Read more.
Artificial intelligence (AI) is emerging as a discipline capable of providing significant added value in Medicine, in particular in radiomic, imaging analysis, big dataset analysis, and also for generating virtual cohort of patients. However, in coping with chronic myeloid leukemia (CML), considered an easily managed malignancy after the introduction of TKIs which strongly improved the life expectancy of patients, AI is still in its infancy. Noteworthy, the findings of initial trials are intriguing and encouraging, both in terms of performance and adaptability to different contexts in which AI can be applied. Indeed, the improvement of diagnosis and prognosis by leveraging biochemical, biomolecular, imaging, and clinical data can be crucial for the implementation of the personalized medicine paradigm or the streamlining of procedures and services. In this review, we present the state of the art of AI applications in the field of CML, describing the techniques and objectives, and with a general focus that goes beyond Machine Learning (ML), but instead embraces the wider AI field. The present scooping review spans on publications reported in Pubmed from 2003 to 2023, and resulting by searching “chronic myeloid leukemia” and “artificial intelligence”. The time frame reflects the real literature production and was not restricted. We also take the opportunity for discussing the main pitfalls and key points to which AI must respond, especially considering the critical role of the ‘human’ factor, which remains key in this domain. Full article
Show Figures

Figure 1

Other

Jump to: Research, Review

29 pages, 1396 KiB  
Systematic Review
The Performance and Clinical Applicability of HER2 Digital Image Analysis in Breast Cancer: A Systematic Review
by Gauhar Dunenova, Zhanna Kalmataeva, Dilyara Kaidarova, Nurlan Dauletbaev, Yuliya Semenova, Madina Mansurova, Andrej Grjibovski, Fatima Kassymbekova, Aidos Sarsembayev, Daniil Semenov and Natalya Glushkova
Cancers 2024, 16(15), 2761; https://doi.org/10.3390/cancers16152761 - 3 Aug 2024
Viewed by 1738
Abstract
This systematic review aims to address the research gap in the performance of computational algorithms for the digital image analysis of HER2 images in clinical settings. While numerous studies have explored various aspects of these algorithms, there is a lack of comprehensive evaluation [...] Read more.
This systematic review aims to address the research gap in the performance of computational algorithms for the digital image analysis of HER2 images in clinical settings. While numerous studies have explored various aspects of these algorithms, there is a lack of comprehensive evaluation regarding their effectiveness in real-world clinical applications. We conducted a search of the Web of Science and PubMed databases for studies published from 31 December 2013 to 30 June 2024, focusing on performance effectiveness and components such as dataset size, diversity and source, ground truth, annotation, and validation methods. The study was registered with PROSPERO (CRD42024525404). Key questions guiding this review include the following: How effective are current computational algorithms at detecting HER2 status in digital images? What are the common validation methods and dataset characteristics used in these studies? Is there standardization of algorithm evaluations of clinical applications that can improve the clinical utility and reliability of computational tools for HER2 detection in digital image analysis? We identified 6833 publications, with 25 meeting the inclusion criteria. The accuracy rate with clinical datasets varied from 84.19% to 97.9%. The highest accuracy was achieved on the publicly available Warwick dataset at 98.8% in synthesized datasets. Only 12% of studies used separate datasets for external validation; 64% of studies used a combination of accuracy, precision, recall, and F1 as a set of performance measures. Despite the high accuracy rates reported in these studies, there is a notable absence of direct evidence supporting their clinical application. To facilitate the integration of these technologies into clinical practice, there is an urgent need to address real-world challenges and overreliance on internal validation. Standardizing study designs on real clinical datasets can enhance the reliability and clinical applicability of computational algorithms in improving the detection of HER2 cancer. Full article
Show Figures

Figure 1

Back to TopTop