Next Article in Journal
Improved Subsidence Assessment for More Reliable Excavation Activity in Tehran
Previous Article in Journal
Sensitive Information Detection Based on Deep Learning Models
Previous Article in Special Issue
Enhancing Inter-Patient Performance for Arrhythmia Classification with Adversarial Learning Using Beat-Score Maps
 
 
Article
Peer-Review Record

Integrating Structured and Unstructured Data with BERTopic and Machine Learning: A Comprehensive Predictive Model for Mortality in ICU Heart Failure Patients

Appl. Sci. 2024, 14(17), 7546; https://doi.org/10.3390/app14177546
by Shih-Wei Wu 1, Cheng-Cheng Li 2, Te-Nien Chien 2,* and Chuan-Mei Chu 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2024, 14(17), 7546; https://doi.org/10.3390/app14177546
Submission received: 19 July 2024 / Revised: 12 August 2024 / Accepted: 21 August 2024 / Published: 26 August 2024
(This article belongs to the Special Issue Artificial Intelligence in Medicine and Healthcare)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This study introduces a novel method for predicting ICU mortality in heart failure patients by combining electronic health record (EHR) data with a BERTopic-based hybrid machine learning approach. The use of natural language processing (NLP) to analyze unstructured data significantly enhances the model's accuracy. By merging structured and unstructured data, the study identifies key variables that improve predictive performance. The research highlights the potential of combining machine learning and NLP for better mortality prediction in ICU patients. The manuscript is well written and clear rationales are provided for how the model were build. 

 

I have only a few comments:

1. On page 7 equation (1) - (4), can the authors add notation (i.e., What is PPV, TP?)? Just in case the readers may feel confusing.

2. Can the authors make some plots to explain the results on Table 2 and Table 3? For example the box plots for each algorithms vs. Time? 

Author Response

Response to Reviewer #1

 

We sincerely appreciate the insightful comments and suggestions provided by you. Your feedback has been instrumental in enhancing the clarity and depth of our manuscript, ensuring a more comprehensive presentation of our research. The detailed review has undoubtedly improved the overall quality of our paper, making it more informative and accessible to our readers. We are grateful for your contributions to refining our study and our approach to presenting our findings.

 

This study introduces a novel method for predicting ICU mortality in heart failure patients by combining electronic health record (EHR) data with a BERTopic-based hybrid machine learning approach. The use of natural language processing (NLP) to analyze unstructured data significantly enhances the model's accuracy. By merging structured and unstructured data, the study identifies key variables that improve predictive performance. The research highlights the potential of combining machine learning and NLP for better mortality prediction in ICU patients. The manuscript is well written and clear rationales are provided for how the model were build. 

 

I have only a few comments:

  1. On page 7 equation (1) - (4), can the authors add notation (i.e., What is PPV, TP?)? Just in case the readers may feel confusing.

Corrections and adjustments:

Thanks for your valuable suggestions. We have revised and replaced our manuscript especially in method and result. Thank you so much.

Performance Evaluation

Our study rigorously assessed the impact of integrating both quantitative and textual data on the predictive accuracy of mortality in ICU patients through the application of five widely acknowledged evaluation metrics in binary classification tasks. These metrics serve as key indicators of model effectiveness and precision:

  • Precision (PPV): Denoting the accuracy of positive predictions, precision is calculated as the ratio of true positives (TP) to the sum of true positives (TP) and false positives (FP).
 

(1)

  • Recall (Sensitivity, TPR): Recall quantifies the model's ability to correctly identify positive cases and is calculated as the ratio of true positives (TP) to the sum of true positives (TP) and false negatives (FN).
 

(2)

  • F1-Score: This is a comprehensive metric combining precision and recall, calculated as the harmonic mean of the two, offering a balanced measure of overall performance.
 

(3)

  • Accuracy: Accuracy assesses the overall correctness of predictions and is calculated as the ratio of the sum of true positives (TP) and true negatives (TN) to the total number of samples in the test dataset.
 

(4)

  • AUROC (Area Under the Receiver Operating Characteristic Curve): AUROC is a common diagnostic metric that provides a comprehensive measure of classifier performance by plotting the relationship between the true positive rate (TPR) and the false positive rate (FPR) across various classification thresholds.

By employing these evaluation metrics, the study aimed to provide a thorough and standardized evaluation of predictive models that integrate both quantitative and text data, specifically in the context of mortality prediction for ICU patients.

 

  1. Can the authors make some plots to explain the results on Table 2 and Table 3? For example the box plots for each algorithms vs. Time? 

Corrections and adjustments:

Thanks for your valuable suggestions. We have revised and replaced our manuscript especially in method and result. Thank you so much.

 

Figure 4 further illustrates the improvement in model applicability achieved through the integration of textual and quantitative data. The inclusion of qualitative data results in a substantial enhancement of AUROC scores, demonstrating that the models exhibit superior effectiveness in predicting mortality across different machine learning methods and temporal intervals.

Figure 4. AUROC of machine learning methods with and without qualitative data.

Figure 5 graphically represents the accuracy of machine learning methods with and without qualitative data across three different timeframes: 3 days, 30 days, and 365 days. These charts are designed to visually emphasize the improvement in predictive accuracy when qualitative data is incorporated. By comparing the performance across different periods, the charts demonstrate how the integration of diverse data forms significantly enhances the models' ability to predict mortality in ICU settings more accurately and reliably.

Figure 5. Accuracy of machine learning methods with and without qualitative data.

 

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Report on: Integrating Structured and Unstructured Data with BERTopic and Machine Learning: A Comprehensive Predictive Model for Mortality in ICU Heart Failure Patients

 

General comments:

The mortality process of heart failure patients who are observed in an intensive care unit is examined by the authors. The novel analysis approach that is being offered considers multiple data acquisition methods. The incorporation of unstructured data improves the predicted performance of the model and illustrates the complementary advantages of natural language processing and machine learning. Although the manuscript appears to have been worked on with much effort, the analysis of the ICU mortality process did not yield any novel insights.

1.      The article uses research methods that are well known and widely described in the literature.

2.      Heart failure is a dynamic process that develops over time; sadly, the authors' analysis describes it as static.

3.      Unfortunately, the authors simply see the process of heart failure as one-dimensional and do not correlate it with other processes, despite the fact that other body processes also influence it.

I therefore recommend against this article's publication in the Applied Sciences.

Specific comments

Line 11 and title: Explain the abbreviation 'ICU'.

Comments on the Quality of English Language

The writing is acceptable. 

Author Response

Dear Reviewer,

Thank you profoundly for your constructive feedback and invaluable guidance throughout the review process. In response to your insightful suggestions, we have made substantial amendments and additions to our manuscript, which we believe have significantly enhanced its quality and relevance to the special issue on Artificial Intelligence in Medicine and Healthcare.

Our research delves deeply into the themes outlined for this special issue, and we are grateful for the opportunity to contribute meaningful findings and discussions to the field. We respectfully request your consideration for the publication of our work, trusting that it offers significant contributions to the ongoing discourse in medical AI.

We are committed to continuing this line of research, guided by your expert advice, to further explore and address critical issues in this domain. We value the thoughtful critiques you have provided, which not only challenge us but also drive our research to greater depths. Thank you once again for your careful consideration and the opportunity to improve our work under your guidance.

 

General comments:

The mortality process of heart failure patients who are observed in an intensive care unit is examined by the authors. The novel analysis approach that is being offered considers multiple data acquisition methods. The incorporation of unstructured data improves the predicted performance of the model and illustrates the complementary advantages of natural language processing and machine learning. Although the manuscript appears to have been worked on with much effort, the analysis of the ICU mortality process did not yield any novel insights.

  1. The article uses research methods that are well known and widely described in the literature.

Corrections and adjustments:

Thank you for your insightful comments and recognition of the efforts invested in our manuscript. We value your critique regarding the novelty of our research approach and would like to emphasize the unique contributions that delineate our work. The primary goal of this study is to develop a robust mortality prediction model for ICU heart failure patients by integrating both structured and unstructured data. Leveraging advanced artificial intelligence technologies, notably the BERTopic model, our methodology effectively analyzes unstructured data, such as patients' medical notes from ICU settings. This approach is distinctive for its application of natural language processing to extract valuable insights from data sources that are often overlooked in conventional predictive models.

Our research extends beyond the traditional scope by applying these innovative techniques to a real-world medical database—specifically, the MIMIC database. Unlike prior studies, we harness the qualitative data available, employing BERTopic techniques to mine this dataset comprehensively. This method allows us to uncover patterns and predictive insights that remain hidden with standard analytical approaches. The novelty of our study lies in its dual focus on integrating diverse data types and applying these insights to improve predictive models for patient mortality, which are already well-established in the medical literature.

In addition to enhancing the predictive accuracy, our study contributes to the medical research community by integrating the latest findings in artificial intelligence. Responding to your suggestions, we have also supplemented our research foundation with the most recent literature, providing a more thorough context for our advancements. This integration provides healthcare professionals with more precise tools for medical decision-making, thereby improving patient care outcomes. We trust that our response adequately addresses your concerns regarding the novelty and significance of our research. We are committed to making meaningful advancements in the utilization of AI in healthcare, believing that our comprehensive and innovative approach significantly enriches the understanding of ICU mortality processes.

Albashayreh, A., Bandyopadhyay, A., Zeinali, N., Zhang, M., Fan, W., & Gilbertson White, S. (2024). Natural Language Processing Accurately Differentiates Cancer Symptom Information in Electronic Health Record Narratives. JCO Clinical Cancer Informatics, 8, e2300235.

Noaeen, M., Amini, S., Bhasker, S., Ghezelsefli, Z., Ahmed, A., Jafarinezhad, O., & Abad, Z. S. H. (2023). Unlocking the power of EHRs: harnessing unstructured data for machine learning-based outcome predictions. Paper presented at the 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC).

Molloy-Paolillo, B., Mohr, D., Levy, D. R., Cutrona, S. L., Anderson, E., Rucci, J., . . . Rinne, S. T. (2023). Assessing Electronic Health Record (EHR) Use during a Major EHR Transition: An Innovative Mixed Methods Approach. Journal of General Internal Medicine, 38(Suppl 4), 999-1006.

 

  1. Heart failure is a dynamic process that develops over time; sadly, the authors' analysis describes it as static.

Corrections and adjustments:

We are grateful for your insightful critique highlighting the portrayal of heart failure as a static condition in our analysis. You correctly emphasize that heart failure is a dynamic process, evolving significantly over time. This observation is crucial as it underscores the complexities involved in managing and predicting outcomes for patients with heart failure.

In the present study, we concentrated on data gathered within the first 24 hours of ICU admission. This focus was intentionally chosen to highlight the early predictive capabilities of our model. Our objective was to illustrate the effectiveness of utilizing early data inputs to forecast mortality, providing clinicians with prompt and actionable insights that are critical during the initial treatment phase.

In response to your recommendations, we are planning to integrate time-series analysis into our future research endeavors. Such an approach will undoubtedly deepen our understanding and enhance the predictive accuracy concerning the clinical trajectory and management of heart failure patients. We anticipate that expanding our model to incorporate longitudinal data will make a substantial contribution to the existing body of knowledge and offer advanced tools for more effective patient care management.

 

  1. Unfortunately, the authors simply see the process of heart failure as one-dimensional and do not correlate it with other processes, despite the fact that other body processes also influence it.

Corrections and adjustments:

Thank you for your insightful comments regarding the scope of our analysis of heart failure. We recognize the limitation in our current approach, which primarily viewed heart failure through a one-dimensional lens, focusing solely on cardiac function without considering the interplay with other physiological systems. Heart failure is indeed a multifaceted condition that is influenced by various bodily processes including renal function, endocrine changes, and neurohormonal systems. These interactions are critical to understanding the full spectrum of the disease and its impact on patient outcomes.

In response to your critique, we aim to broaden our future research to include multidimensional analysis that considers the complex interdependencies between heart failure and other physiological processes. This will involve not only a deeper dive into the pathophysiological mechanisms but also the integration of multi-system data to enhance the predictive accuracy and clinical relevance of our models. By adopting a more holistic approach, we hope to provide a more comprehensive understanding of heart failure, ultimately leading to better tailored and more effective interventions.

 

Specific comments

Line 11 and title: Explain the abbreviation 'ICU'.

Corrections and adjustments:

Thank you for pointing out the oversight regarding the abbreviation 'ICU' in Line 11 and in the title of our manuscript. We have now amended the text to define 'ICU' as 'Intensive Care Unit' at its first mention to ensure clarity for all readers. This correction has been applied throughout the document to maintain consistency and enhance readability.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The aim of this study is to develop a comprehensive prediction model for mortality in heart failure patients in the ICU, integrating structured and unstructured data. Based on artificial intelligence (BERTopic) for analyzing unstructured data, this approach shows significant promise for increasing the accuracy of prediction models. By combining structured and unstructured data through machine learning methods, this study aims to equip healthcare professionals with more accurate mortality prediction tools, ultimately improving treatment outcomes and medical decision-making. Incorporating unstructured data into health research facilitates a deeper understanding of patients' vital signs, personal histories and the diagnostic reports of medical staff. This research seeks to introduce a more comprehensive and insightful methodology into medical research, thus contributing to the continued advancement of predictive models in healthcare.

The authors present the results of a very interesting study that involved collecting quantitative information from ICU patients with congestive HF, including vital signs and basic data. Textual data was obtained from post-admission medical records. The authors filtered and pre-processed the data to ensure its quality and consistency. To analyze the textual data, BERTopic technology was used to generate topics, improving the understanding of the patients' textual descriptions. Six commonly used machine learning methods were used, namely GaussianNB, AdaBoost, Bagging, Gradient Boosting, MLP Classifier, LightGBM and XGBoost, in order to integrate quantitative and textual data to predict the mortality rate of HF patients in the ICU. Finally, the model's performance was evaluated using five metrics: Precision, Recall, F1-Score, Accuracy and AUROC. The authors combine various data analysis and machine learning techniques in order to establish a predictive model to assess the mortality rate of patients with congestive HF in the ICU, providing valuable information for clinical decision-making and treatment strategies.

The theme is well founded through a brief but consistent literature review. The objectives are correctly defined. The methodology is well-designed and is consistent with the objectives of the study. The interpretation and discussion of results is clear, objective, and consistent. The conclusions summarize well the results obtained and are consistent with the work presented.

The study conducted by the authors constitute a strong point of this work.

Some minor aspects:

Figures 2 and 3 must be referred to in the text before it appears

p.7, line 266 - Review the colon used at the end of the paragraph

Tables 2 and 3 must be referred to in the text before it appears

Congratulations to the authors for their work!

Author Response

Dear Reviewer,

 

Thank you for your thorough and meticulous review of our manuscript. We are immensely grateful for the depth of understanding you have shown regarding the key points and contributions of our study. Your detailed feedback has not only affirmed the value of our research but has also guided significant refinements that enhanced the coherence, readability, and overall quality of our work.

 

We deeply appreciate the time and effort you invested in reviewing our work and providing constructive feedback. Your expertise has been invaluable in helping us present our findings more clearly and effectively. Thank you once again for your affirmations and for recognizing the potential impact of our research.

 

General comments:

The aim of this study is to develop a comprehensive prediction model for mortality in heart failure patients in the ICU, integrating structured and unstructured data. Based on artificial intelligence (BERTopic) for analyzing unstructured data, this approach shows significant promise for increasing the accuracy of prediction models. By combining structured and unstructured data through machine learning methods, this study aims to equip healthcare professionals with more accurate mortality prediction tools, ultimately improving treatment outcomes and medical decision-making. Incorporating unstructured data into health research facilitates a deeper understanding of patients' vital signs, personal histories and the diagnostic reports of medical staff. This research seeks to introduce a more comprehensive and insightful methodology into medical research, thus contributing to the continued advancement of predictive models in healthcare.

The authors present the results of a very interesting study that involved collecting quantitative information from ICU patients with congestive HF, including vital signs and basic data. Textual data was obtained from post-admission medical records. The authors filtered and pre-processed the data to ensure its quality and consistency. To analyze the textual data, BERTopic technology was used to generate topics, improving the understanding of the patients' textual descriptions. Six commonly used machine learning methods were used, namely GaussianNB, AdaBoost, Bagging, Gradient Boosting, MLP Classifier, LightGBM and XGBoost, in order to integrate quantitative and textual data to predict the mortality rate of HF patients in the ICU. Finally, the model's performance was evaluated using five metrics: Precision, Recall, F1-Score, Accuracy and AUROC. The authors combine various data analysis and machine learning techniques in order to establish a predictive model to assess the mortality rate of patients with congestive HF in the ICU, providing valuable information for clinical decision-making and treatment strategies.

The theme is well founded through a brief but consistent literature review. The objectives are correctly defined. The methodology is well-designed and is consistent with the objectives of the study. The interpretation and discussion of results is clear, objective, and consistent. The conclusions summarize well the results obtained and are consistent with the work presented.

The study conducted by the authors constitute a strong point of this work.

  • Figures 2 and 3 must be referred to in the text before it appears
  • 7, line 266 - Review the colon used at the end of the paragraph
  • Tables 2 and 3 must be referred to in the text before it appears

 

Corrections and adjustments:

Thank you for your attentive and detailed feedback. We appreciate your guidance in enhancing the clarity and structure of our manuscript. In response to your suggestions, we have carefully reviewed and adjusted the text, ensuring that all elements, including figures and tables, are introduced logically and effectively within the narrative. We have made these changes with the aim of improving the manuscript’s presentation and coherence. We hope these revisions meet your expectations and significantly enhance the quality of the manuscript.

 

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

Manuscript title: Integrating Structured and Unstructured Data with BERTopic and Machine Learning: A Comprehensive Predictive Model for Mortality in ICU Heart Failure Patients

The authors answered all the questions. The writing is acceptable. Applied Sciences journal has the potential to accept the paper.

Comments on the Quality of English Language

The writing is acceptable

Back to TopTop