Next Article in Journal
Detection of Korean Phishing Messages Using Biased Discriminant Analysis under Extreme Class Imbalance Problem
Next Article in Special Issue
A Multimethod Approach for Healthcare Information Sharing Systems: Text Analysis and Empirical Data
Previous Article in Journal
Cybercrime Intention Recognition: A Systematic Literature Review
Previous Article in Special Issue
Advancing Tuberculosis Detection in Chest X-rays: A YOLOv7-Based Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Addressing Data Scarcity in the Medical Domain: A GPT-Based Approach for Synthetic Data Generation and Feature Extraction

School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
Information 2024, 15(5), 264; https://doi.org/10.3390/info15050264
Submission received: 25 March 2024 / Revised: 24 April 2024 / Accepted: 3 May 2024 / Published: 6 May 2024
(This article belongs to the Special Issue Information Systems in Healthcare)

Abstract

:
This research confronts the persistent challenge of data scarcity in medical machine learning by introducing a pioneering methodology that harnesses the capabilities of Generative Pre-trained Transformers (GPT). In response to the limitations posed by a dearth of labeled medical data, our approach involves the synthetic generation of comprehensive patient discharge messages, setting a new standard in the field with GPT autonomously generating 20 fields. Through a meticulous review of the existing literature, we systematically explore GPT’s aptitude for synthetic data generation and feature extraction, providing a robust foundation for subsequent phases of the research. The empirical demonstration showcases the transformative potential of our proposed solution, presenting over 70 patient discharge messages with synthetically generated fields, including severity and chances of hospital re-admission with justification. Moreover, the data had been deployed in a mobile solution where regression algorithms autonomously identified the correlated factors for ascertaining the severity of patients’ conditions. This study not only establishes a novel and comprehensive methodology but also contributes significantly to medical machine learning, presenting the most extensive patient discharge summaries reported in the literature. The results underscore the efficacy of GPT in overcoming data scarcity challenges and pave the way for future research to refine and expand the application of GPT in diverse medical contexts.

1. Introduction

The burgeoning field of medical machine learning confronts an ardent challenge—the paucity of comprehensive and clinically labeled training data [1,2]. The intricate nature of medical data, coupled with stringent privacy regulations, results in a scarcity that hampers the efficacy of machine learning models in healthcare applications. In particular, the insufficiency of labeled data exacerbates the predicament, impeding the ability to develop robust models capable of meaningful clinical insights [1,2].
This research endeavors to alleviate the constraints posed by the limited availability of labeled medical data by harnessing the unparalleled capabilities of Generative Pre-trained Transformers (GPT). In this study, we propose a novel approach that utilizes GPT to synthetically generate medical data, thereby circumventing the challenges associated with data scarcity. Moreover, GPT’s intrinsic ability to analyze and comprehend the synthetic data it generates opens avenues for the extraction of new features, offering a solution to the dearth of labeled data in the medical domain.
The first phase of our investigation involves a meticulous and systematic review of the existing literature, delving into the capabilities of GPT in synthetic data generation. By scrutinizing prior studies, we aim to provide a comprehensive understanding of GPT’s prowess in generating synthetic data for training machine learning models, thereby laying the groundwork for the subsequent phases of our research. Building upon the insights gleaned from the literature, our study proceeds to explore how GPT can not only generate synthetic data but also engage in the analysis of these datasets to extract novel features. Through a critical examination of existing methodologies, we seek to elucidate the potential of GPT in addressing the challenge of data scarcity from a holistic perspective.
As a practical demonstration of our proposed approach, we present a method for synthetically generating patient discharge messages using GPT, as conceptually represented in Figure 1. This pragmatic application serves as a testament to the feasibility and effectiveness of our proposed solution in tackling the limited availability of training data in the medical domain. Furthermore, we showcase how GPT can play a pivotal role in feature extraction from these synthetic patient discharge messages, illustrating its capability to mitigate the scarcity of labeled data (as shown in Figure 1). Through these empirical demonstrations, we aim to establish a robust foundation for the integration of GPT into the realm of medical machine learning, paving the way for enhanced model development in the face of data scarcity. Within the scope of this study, more than 70 patient discharge messages were automatically generated by the proposed GPT prompt. For all these discharge messages, seventeen fields were synthetically generated first, and then three more fields were generated for labeling these discharge message (e.g., severity, chances of hospital re-admission with justificaiton).
This study contributes to the current body of knowledge in the following ways:
  • Conducted a comprehensive review of existing literature to explore the utilization of GPT in the medical domain. Among twenty identified works, this study highlighted seven distinct research endeavors that employed GPT to generate or enhance medically relevant data [3,4,5,6,7,8,9].
  • Unlike previous studies that relied on manual utilization of GPT’s web interface (as shown in [3,4,5,6,7,8,9]), this research autonomously leveraged the GPT Application Programming Interface (API) alongside automation tools, enabling the efficient generation of a large volume of medically significant data.
  • Employing innovative prompt engineering techniques, this study generated 70 synthetic patient discharge messages encompassing seventeen fields and autonomously labeled these messages using GPT technology, resulting in the addition of three augmented fields.
  • The generated data underwent evaluation by medical professionals, yielding an impressive average precision, recall, and F1-score of 0.95, 0.97, and 0.96, respectively.
  • Furthermore, the synthetically generated medical data were subjected to machine learning algorithms such as regression to uncover hidden correlations among various parameters.
In essence, this research seeks to contribute a novel and comprehensive methodology to the growing body of knowledge addressing the challenges posed by data scarcity in the medical domain [1,2,10]. According to the literature and to the best of our knowledge, this is the first study to generate higly accurate (with F1-score of up to 97%) patient dischage summaries using GPT technology.

2. Literature Review

A recent study in [11] reviews the use of ChatGPT in various aspects of medical research. It evaluates the evidence of ChatGPT’s application in areas including but not limited to treatment, diagnosis, medication provision, drug development, medical report improvement, literature review writing, research conduct, data analysis, and personalized medicine. The review follows the PRISMA guidelines and encompasses studies published between 2022 and 2023. The paper in [12] explores the use of ChatGPT in the systematic review and meta-analysis process in medical research. The paper discusses how ChatGPT can be used for tasks like Risk of Bias analysis and data extraction from randomized controlled trials, highlighting the tool’s ability to reduce the time and effort required for these tasks. It directly addresses the use of ChatGPT in streamlining the process of conducting systematic reviews and meta-analyses, which are integral components of evidence-based decision making in healthcare [12]. The paper illustrates how AI, specifically ChatGPT, can assist in various steps of the systematic review process, including evaluating methodologies and extracting data. The study in [13] focuses on the application of ChatGPT in streamlining the literature selection process for meta-analysis in medical research. It outlines a methodology for using ChatGPT to facilitate the screening of titles and abstracts during meta-analysis, aiming to reduce workload while maintaining recall efficiency. The study includes a glioma meta-analysis for validation and discusses the development of a pipeline called LARS (Literature Records Screener) to assess the performance of ChatGPT in this context [13]. It deals directly with improving the efficiency and effectiveness of literature selection and screening in the context of meta-analysis, a crucial step in systematic reviews and research synthesis [13]. The research work in [14] discusses the potential public health risks posed by large language models like ChatGPT, specifically focusing on the spread of misinformation (infodemic). It explores the evolution of these models, their impact on scientific literature production, and the need for policies to mitigate misinformation risks. It focuses on the broader public health impact and ethical considerations of AI technology in disseminating information [14]. The paper in [15] focuses on evaluating the use of large language models (LLMs) in healthcare. It addresses the need for a comprehensive evaluation framework that assesses LLMs not just for their natural language processing performance but also for their translational value in healthcare. The paper discusses various aspects of LLMs in healthcare, ethical concerns, and proposes a framework for evaluating their application in this field. It goes beyond just the technical aspects of LLMs and delves into the ethical, governance, and practical implications of their use in healthcare [15]. This paper emphasizes a comprehensive evaluation that includes translational value assessment and ethical considerations [15]. The publication in [16] examines the potential influence of large language models like ChatGPT on the field of nuclear medicine. It discusses the capabilities of these models in generating human-like text, their impact on academic publishing, and the potential risks associated with their use in the context of nuclear medicine. It highlights issues like academic integrity, misinformation, and the challenges posed by AI in producing reliable medical content [16]. The focus is on the broader implications of using AI tools like ChatGPT in nuclear medicine, particularly concerning the reliability of the content produced and the ethical considerations surrounding their use in academic and clinical settings [16]. The discussion includes the potential for AI-generated content to influence academic integrity and the spread of misinformation, which are key concerns in the context of public health and ethical use of AI in medicine [16].
The paper in [3] explores the potential of AI, particularly large language models (LLMs) like GPT-4, in generating original scientific research. It discusses the use of GPT-4 to write an original pharmaceutics manuscript, including formulating a research hypothesis, defining an experimental protocol, producing photo-realistic images, generating analytical data, and writing a publication-ready manuscript. This study also examines the limitations of LLMs in referencing literature and emphasizes the need for human input in interpretation and data validation [3]. It focuses on the innovative use of LLMs to generate and augment data, such as creating believable analytical data and images for pharmaceutical research [3]. The emphasis on the AI model’s ability to conceive and execute a research hypothesis and generate multimodal data aligns with the aspects of data generation and augmentation [3]. Research work in [17] explores the applications of ChatGPT and other large language models in various aspects of orthopedics, including education, surgery, and research [17]. The study discusses how these AI tools can assist orthopedic clinicians and surgeons in tasks like disease diagnosis, surgical planning, and educational support. The focus is on the practical applications of ChatGPT in providing assistance to medical professionals in orthopedics, including aiding in diagnosis, surgery, and medical education, which aligns with the aspects of decision support and medical inquiry assistance [17]. The study in [18] presents a systematic review of the applications, benefits, and limitations of ChatGPT in healthcare education, research, and practice. The review includes an analysis of the potential benefits of ChatGPT in scientific writing, healthcare research, and practice, along with concerns regarding ethical, copyright, transparency, and legal issues [18]. Recent work in [19] examines the potential of AI systems, specifically large language models, in generating health awareness messages. The study uses the Bloom model for generating messages about folic acid, comparing them to highly retweeted human-generated messages in terms of quality and clarity. It also involves human and computational evaluations to assess the effectiveness of AI-generated messages in health communication. It focuses on the empirical assessment of AI-generated health messages, analyzing their effectiveness and comparing them to human-generated content [19]. The emphasis on computational and human evaluations of the messages aligns with the aspects of data analysis in medical research [19]. The study in [4] focuses on using GPT-3.5 for data augmentation to address vaccine hesitancy classification in the Dutch language. The study leverages the language model for generating realistic examples of anti-vaccination tweets and evaluates the impact of this augmentation on various machine learning models [4]. It also examines the ability of the synthetic data to generalize to human data in classification tasks. It illustrates the use of GPT-3.5 for generating synthetic data to balance an imbalanced dataset in vaccine hesitancy monitoring, highlighting its capabilities in data augmentation and labeling [4].
Recent work in [5] focuses on enhancing medical question answering systems using GPT-2 for question augmentation and T5-Small for topic extraction. The paper details a model that employs BERT, GPT-2, and T5-Small to improve medical question answering performance, demonstrating the effectiveness of these techniques through experiments [5]. It highlights the use of AI models for augmenting medical question data, a crucial aspect in improving the quality and coverage of datasets used in medical question answering systems [5]. The study in [6] examines the use of GPT-3 in generating synthetic data for Human–Computer Interaction (HCI) research. It explores the ability of GPT-3 to produce believable accounts of HCI experiences and discusses the potential benefits and risks associated with using synthetic data generated by language models. It highlights the use of GPT-3 for generating synthetic user research data, focusing on the model’s ability to create realistic and believable responses in an HCI context [6]. The paper in [7] presents a study on using GPT-2 for data augmentation in the context of patient outcome prediction. The focus is on generating artificial clinical notes in Electronic Health Records (EHRs) to improve the training of machine learning models for predicting patient outcomes, such as readmission rates. The paper discusses a novel textual data augmentation method and evaluates its effectiveness in enhancing predictive performance of deep learning models in healthcare [7]. It explores the use of GPT-2 to augment medical datasets, specifically focusing on generating textual data that can be used to train models for predicting patient outcomes, aligning with data augmentation and labeling aspects [7]. The research work in [8] focuses on using GPT-2 to generate synthetic biological signals, specifically EEG (electroencephalography) and EMG (electromyography), to enhance data classification. The study demonstrates that models trained on synthetic data generated by GPT-2 can classify real EEG and EMG datasets with significant accuracy and that the inclusion of synthetic data during training improves classification performance [8]. It emphasizes the use of AI for generating synthetic biological signals, which augments the available data for training machine learning models in the field of biological signal processing [8]. The paper in [9] focuses on using Transformer-based models, particularly GPT-2, for generating synthetic medical text to augment datasets. The study experiments with these models for data augmentation in clinically relevant NLP tasks such as unplanned readmission prediction and phenotype classification. It evaluates the effectiveness of synthetic data in improving the performance of deep learning models in these healthcare contexts [9]. It highlights the application of AI models in creating synthetic medical text data, aiming to augment existing datasets for improved model training and performance in specific medical tasks [9]. Finally, the paper in [20] discusses the potential of ChatGPT in various medical applications. It examines ChatGPT’s ability to develop AI programs for medicine, its limitations and challenges, ethical concerns like biases and patient confidentiality, and compliance with healthcare regulations. The paper highlights ChatGPT’s potential in democratizing coding and developing AI in medicine, leading to breakthroughs in the medical AI sector [20]. The focus on ethical concerns, patient autonomy, and the responsible use of AI in medicine, along with the exploration of AI’s potential to revolutionize medical research and practice, aligns with this category [20]. These existing research works could be categorized into six distinct categores, as described in Figure 2.
  • Literature Review and Meta-Analysis: Studies in [11,12,13,18] illustrate how AI, specifically ChatGPT, can streamline literature reviews and meta-analyses, aiding in efficient data extraction and evaluation methodologies.
  • Data Analysis: As demonstrated in [21,22,23,24,25,26], GPT assists in analyzing research data and generating critical insights. Within the medical domain, research works in [11,19] demonstrate AI’s utility in analyzing complex datasets, including patient outcomes and health message effectiveness, enhancing predictive modeling and comprehension of medical data.
  • Medical Question Answering and Decision Support Systems: Studies like [11,17,18] show the role of AI in assisting medical professionals with accurate information, aiding diagnosis, and providing decision support in clinical settings.
  • Drug Discovery and Clinical Trial Analysis: While not directly covered in the reviewed articles, this category involves using AI to accelerate drug discovery processes and analyze clinical trial data, potentially enhancing the efficiency and efficacy of pharmaceutical development [11].
  • Ethical and Public Health Implications of AI in Medicine: Several recent studies like [11,14,15,16,18,20] discuss the broader ethical implications and public health concerns of AI in medicine, including misinformation and academic integrity.
  • Data Generation, Augmentation, and Labeling: To generate new features from data with limited fields, machine learning techniques like entity recognition, category classification, sentiment analysis, and others have traditionally been used [27,28,29,30,31,32,33,34]. After generating new features, the augmented data can be used to effectively train the machine learning models [27,28,29,30,31,32,33,34]. However, with the advent of GPT, new features could be generated either from synthetic data or from existing data, without using traditional feature extraction approaches, as shown in [35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70]. Even within the medical domain, synthetic data creation, data augmentation, and labelling have been proven to be crucial in recent times [3,4,5,6,7,8,9]. These papers illustrate the use of AI for creating and enhancing medical datasets, crucial for training robust machine learning models.
Finally, Table 1 clearly depicts how existing research works on using GPT in the medical domain could be categorized. As shown in Table 1, most of the existing liturature falls under the category of “Data Generation, Augmentation, and Labeling”. Within the next section, a practical scenario of how GPT could be used to generate synthetic medical data as well as how to generate labels for these synthetic data will be detailed.

3. Methods

The GPT model is based on the Transformer architecture, which involves several key components, like Input Embedding and Positional Encoding, Transformer Blocks, Feed-Forward Neural Network, Normalization and Residual Connections, and Output layer [71].

3.1. Input Embedding and Positional Encoding

  • Each input token (word or sub-word) is converted into a vector through an embedding layer.
  • Positional encodings are added to these embeddings to provide information about the position of each token in the sequence.
  • The combined embedding, E, is given by Equation (1).
E = E t o k e n + E p o s i t i o n

3.2. Transformer Blocks

Each block consists of two main parts, the Multi-Head Self-Attention mechanism and the Feed-Forward Neural Network.
  • Multi-Head Self-Attention:
    • The attention mechanism can be described by Equation (2).
      A t t e n t i o n Q , K , V = s o f t m a x Q K T d k V
    • In Equation (2), Q, K, V are the query, key, and value matrices, and dk is the dimension of the keys.
    • In multi-head attention, this process is carried out in parallel multiple times with different, learned linear projections of the queries, keys, and values. The outputs are then concatenated and linearly transformed.
  • Feed-Forward Neural Network:
    • Each layer contains a fully connected feed-forward network, which is applied to each position separately and identically. This typically involves two linear transformations with a ReLU activation in between. It is represented with Equation (3).
F F N x = max 0 , x W 1 + b 1 W 2 + b 2

3.3. Normalization and Residual Connections

  • Each sub-layer (self-attention, feed-forward) in a transformer block has a residual connection around it, followed by layer normalization.
  • The output of each sub-layer is L a y e r N o r m ( x + S u b l a e r x ) , where S u b l a e r x is the function implemented by the sub-layer itself.

3.4. Output Layer

  • The final layer is a linear transformation followed by a softmax function to predict the probability of the next token in the sequence.
  • The output probabilities for a token are computed as s o f t m a x ( x W + b ) , where W and b are the weights and biases of the output layer.
This mathematical framework enables GPT to capture complex patterns and relationships in sequential data [71] and is used in this study to generate synthetic patient discharge messages and even perform analysis on those discharge messages for assessing severity and chances of hospital readmission.

3.5. The Process of Automating Synthetic Medical Data Generation

In the conventional approach, users of GPT technology access the model through its web interface, initiating interactions via specific prompts to derive outputs from the system (as shown in Figure 3). This traditional approach has been demonstrated by research works [3,4,5,6,7,8,9]. Employing such a traditional methodology to produce synthetic medical data necessitates substantial user involvement, which can be time-consuming. To circumvent the need for manual intervention in querying the GPT interface, the current study integrates the GPT API with Microsoft Power Automate to fully automate the process of generating patient discharge summaries, as shown in Figure 3. Microsoft Power Automate orchestrates the interactions with the GPT through its API, facilitating a seamless automated workflow. Consequently, this novel automation strategy enhances the efficiency and effectiveness of generating synthetic patient discharge messages, thus streamlining the process significantly. As seen from Figure 3, the proposed approach of interacting with ChatGPT API is automated, fast, and efficient.
As seen in Figure 4, the orchestration of GPT API communication is performed using Microsoft Power Automate. The HTTP request component of Microsoft Power Automate can autonomously invoke multiple API calls. As shown in Figure 4, the first HTTP post call to GPT API generates 70 discharge messages. The second HTTP post call then critically evaluates these messages and labels them in terms of (1) severity, (2) chances of hospital readmission, and (3) reasoning. The details of both these calls are shown in Figure 5. It should be noted that Microsoft Power Automate allows the second prompt to investigate the previously generated synthetic message through the variable “Output”, as shown in Figure 5b. Thus, the contextual background of the previously generated messages could be efficiently analyzed in the second prompt, along with augmenting the previous messages with newer labels (i.e., severity, chances of hospital readmission, and reasoning). The reasoning information would be validated by expert doctors at a later stage.
As shown in Figure 1 and Figure 5, a specially engineered GPT prompt can be used for generating patient discharge messages. Microsoft Power Automate with GPT API automatically generates patient discharge summaries with specifically guided headings, like Diagnosis, Treatment, Patient Instructions, Medications on Discharge, etc. The complete list can be seen from Appendix A using the prompt of Box 1. Many of these headings (presented in Appendix A) are required for assessment of severity and predicting the chances of hospital readmission, which would be performed in the next stage. As seen from Figure 6, Figure 7, Figure 8 and Figure 9, GPT generated the discharge summaries synthetically (i.e., not real patient information).
Box 1. Generating Synthetic Patient Discharge Summaries.
Generate patient discharge summary with following fields: Patient Name, Age, Gender, Date of Admission, Date of Discharge, Admitting Physician, Discharging Physician, Reason for Admission, Treatment and Surgical Procedures, Patient’s Response to Treatment, Medical History, Hospital Course, Follow-up, Patient Instructions, Final Diagnosis, Discharge Condition, and Discharge Medications. Detailed single line response with each field separated with “|” character.
Images of the patients could also be generated by adding prompt of Box 2 along with Box 1.
Box 2. Generating the Images of the Patients Using the Information from Discharge Summaries.
Based on the description of the generated discharge summary, generate an image of that patient.
For Alex Johnson (Figure 6), the GPT response before generating the synthetic patient image is “Based on this summary, I will create an artistic representation of Alex Johnson, a 38-year-old male who has just recovered from an appendectomy. Let’s visualize Alex as having short brown hair, a medium build, and a friendly appearance, reflecting his recovery phase”.
As shown earlier in Figure 1 from the synthetically generated discharge summaries, GPT can effectively be used for generating new features. Figure 5b and Figure 10 illustrate this process further. As seen from Figure 10, critical information (e.g., nature of their medical conditions, treatments received, and the instructions provided upon discharge) are used for generating new features like severity of condition and change of hospital readmission. Box 3 shows the GPT prompt used for this feature augmentation process (as previously demonstrated in Figure 5b).
Box 3. Generating New Features for Labeling the Discharge Messages.
Rate the severities of these patients along with their chance of hospital readmission for each of these patients.
As seen from Figure 10, for Alex Johnson (i.e., discharge summary presented in Figure 6), GPT assessed the severity of his condition to be “Moderate” and the changes of hospital readmission to be “Low to Moderate”. This process can be effectively used to label the synthetic data as low, moderate, high, etc., and could be efficiently used to train machine learning models at a later stage. The same methodology could be used for generating synthetic electrocardiogram signals or other bio-signals as well as labelling these signals. Hence, GPT to solve GPT is presented as an effective solution towards solving data scarcity as well as fewer labels in the medical domain.

4. Results

Using the methodology detailed in the previous section, within this study, 70 patient discharge summaries were synthetically generated. As seen from Table 2, these patient discharge summaries had 20 fields comprising Patient Name, Age, Gender, Date of Admission, Date of Discharge, Admitting Physician, Discharging Physician, Reason for Admission, Treatment and Surgical Procedures, Patient’s Response to Treatment, Medical History, Hospital Course, Follow-up, Patient Instructions, Final Diagnosis, Discharge Condition, Discharge Medications, Severity Level, Probability of Hospital Re-admission, and Reasoning. As mentioned in the previous section, the first 17 fields were generated with GPT Prompt 1 and then labelling information (i.e., Severity Level, Probability of Hospital Re-admission, and Reasoning) was generated with Prompt 2. Appendix A shows the details of these 70 generated discharge summaries. Out of these 20 fields, only Age was numeric in nature, and as a result, Table 3 provides various statistics on this numeric field. The value of Age ranged between 23 and 89. There were two date fields, namely date of admission and date of discharge.
Date of admission ranged from 12 January 2021 to 20 December 2021. Date of discharge ranged from 20 January 2021 to 30 December 2021. From these date fields, the duration of hospital stay could be calculated. Hospital stay ranged from 3 (for Sophie Duncan) to 334 days (Maria Johnson). Finally, Figure 11 shows the distributions of labeling data (i.e., Severity level and Chances of Hospital Re-admission). As seen from Figure 11, 12.86% of the discharge summaries were labeled with the severity level of high and 67.14% of the discharge summaries were labeled with severity level being low. In terms of hospital re-admission, 60% of cases were moderate, 24.29% of cases were low, and 15.71% of the cases were flagged as “moderate to high”.
The last three columns in Table 3, namely Severity Level, Probability of Hospital Re-admission, and Reasoning, were generated anew using Prompt 3. This additional information was autonomously generated by GPT, as demonstrated in Figure 5b. Given that GPT was instructed to act as a medical professional in generating these details, the augmented data underwent evaluation by two medical experts.
The evaluation results are depicted in Table 4, revealing an average precision, recall, and F1-score of 0.95, 0.97, and 0.96, respectively, across all three labeled tasks. This indicates GPT’s capability to automatically label medical data with a high level of accuracy. Notably, in Table 4, the F1-Score was highest, at 97% for reasoning, followed by severity and likelihood of hospital admission. This manual validation process underscores the potential for utilizing GPT and related technologies with confidence in generating and enhancing synthetic medical data.
Other than manually evaluating the validity of generated information, machine learning algorithms could also be used on the generated synthetic data for obtaining AI-driven insights [72]. The next section will discuss how machine learning algorithms could be used on these synthetic data for obtaining AI-driven insights.

5. Discussion and Concluding Remarks

This research introduces a groundbreaking methodology to address the challenge of data scarcity in medical machine learning by leveraging the capabilities of GPT. The study proposes a comprehensive approach that utilizes GPT for synthetic data generation and subsequent feature extraction, offering a transformative solution to the limitations imposed by the scarcity of labeled medical data. The empirical demonstration involving the synthetic generation of patient discharge messages serves as a practical testament to the feasibility and effectiveness of the proposed methodology, showcasing its potential to revolutionize the integration of GPT into the realm of medical machine learning. Figure 12 shows the deployment of the GPT-based solution in the latest Samsung Galaxy S23 Ultra mobile phone using Microsoft Power BI’s deployed App. The application of this deployment process has been showcased in recent studies through the utilization of low-code platforms [27,30,31,32,34]. As this study exclusively solved the labeled data scarcity for training machine learning models within medical domain (as discussed in [1,2,10]), it needs to be demonstrated how the generated synthetic data could be used in machine leanirng. Figure 12 shows that automated regression identified “Hospital Stays” to be highly corelated with the severity of the patient. The AI-driven insight shown in Figure 12 (within Samsung Galaxy S23 Ultra Mobile) shows that out of the 19 fields, Patient’s Age, Chance of Hospital readmission, and Hospital stays are correlated with severity. This automated regression using “Key Influencer” visualization of Microsoft Power BI has been reported in [73]. The previous section evaluated the validity of the generated medical data using manual evaluation by an expert medical professional. Now, this section demonstrates the use of the automated machine learning algorithm (i.e., regression to obtain the correlated variables) on the synthetic data.
In summary, this study presents a pioneering and thorough methodology designed to address the data scarcity issues faced by researchers and scientists in the medical field. Leveraging this approach, automation tools such as Microsoft Power Automate were employed alongside the ChatGPT API to not only generate synthetic medical data automatically but also to label these datasets autonomously. The labeling process conducted by GPT was manually assessed by medical experts, yielding an impressive F1-score of 97%. Additionally, machine learning techniques, including regression analysis, were applied to the synthetic data, affirming the validity of the generated information. The integration of ChatGPT API’s synthetic data generation and feature extraction capabilities not only facilitates the development of more robust machine learning models for healthcare applications but also sets the stage for future research endeavors. Future works should explore the application of GPT across diverse medical datasets, optimize its capabilities for specific contexts, and delve into the ethical implications of deploying synthetic data in medical research. This study lays the foundation for a trajectory of research that promises to redefine the landscape of medical machine learning, ultimately benefiting both researchers and clinicians in their pursuit of improved healthcare outcomes.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data attached within this paper.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A

Table A1. Seventy patient discharge summaries generated with GPT.
Table A1. Seventy patient discharge summaries generated with GPT.
Patient NameAgeGenderDate of AdmissionDate of DischargeAdmitting PhysicianDischarging PhysicianReason for AdmissionTreatment and Surgical ProceduresPatient’s Response to TreatmentMedical HistoryHospital CourseFollow-UpPatient InstructionsFinal DiagnosisDischarge ConditionDischarge MedicationsSeverity LevelProbability of Hospital Re-AdmissionReasoning
John Doe34Male1/1/20212/2/2021Dr. SmithDr. WilliamsAcute appendicitisAppendectomyPatient responded well to surgical interventionNo significant past medical historyPatient underwent successful appendectomy, recovered without complicationsTo review in outpatient clinic after 1 weekLight diet, rest and wound careFinal diagnosis of acute appendicitisStable at dischargePrescribed antibiotics, painkillers, and laxativesModerateLowSeverity based on condition ‘Acute appendicitis’. Readmission probability based on discharge condition ‘Stable at discharge’ and medical history.
Maria Johnson56Female1/12/202112/12/2021Dr. JohnsonDr. RobinsonStrokeIV Thrombolysis, Physical therapySignificant improvement in mobility and speechHistory of hypertension and heart diseasePatient received thrombolysis within time limit and underwent intense rehabTo review in stroke clinic after 4 weeksMedication compliance, regular exercise, and healthy dietFinal diagnosis of ischemic strokeFunctional improvements, stable at dischargePrescribed blood thinners, statins, and antihypertensivesHighModerate to HighSeverity based on condition ‘Stroke’. Readmission probability based on discharge condition ‘Functional improvements, stable at discharge’ and medical history.
Susan Harris38Female3/15/20213/20/2021Dr. RussoDr. MurrayGallstonesLaparoscopic cholecystectomyPatient responded well to surgeryNo significant past medical historySurgery was uncomplicated and patient recovered without issueFollow up with primary care in 2 weeksMaintain low-fat dietFinal diagnosis of cholelithiasis and cholecystitisStable, full recovery anticipatedPrescribed painkillers and antibiotics.ModerateLowSeverity based on condition ‘Gallstones’. Readmission probability based on discharge condition ‘Stable, full recovery anticipated’ and medical history.
James Thompson69Male2/1/20212/7/2021Dr. WhiteDr. BlackChest pain, confirmed as myocardial infarctionAngioplasty and stent placementPatient showed remarkable improvement post-procedureHas a history of diabetes and hypertensionPatient had a successful procedure and was monitored in ICU for a day. Released later to general wardCardiology follow-up in one monthLifestyle modification, medication complianceAcute anterior wall Myocardial InfarctionStable at dischargeMedications including antiplatelets, beta-blockers, ACE inhibitors, statins and anti-diabetic regimen.HighModerate to HighSeverity based on condition ‘Myocardial Infarction’. Readmission probability based on discharge condition ‘Stable at discharge’ and medical history of diabetes and hypertension.
Elizabeth Davis42Female4/10/20214/15/2021Dr. TurnerDr. WalkerPneumoniaAntibiotics treatment and respiratory therapyPatient’s condition improved significantlyPreviously healthy with no significant medical historyTreated with IV antibiotics and oxygen through nasal cannulaPulmonary follow-up in 3 weeksCompletion of oral antibiotic course, rest, and hydrationFinal diagnosis of community-acquired pneumoniaImproving at dischargeOral antibiotics and bronchodilator inhaler.ModerateLowSeverity based on condition ‘Pneumonia’. Readmission probability based on discharge condition ‘Improving at discharge’ and previously healthy status.
David Wilson57Male10/21/202110/31/2021Dr. MorrisDr. WrightLiver failureSupportive care, liver transplant assessmentSlow but steady improvementHistory of alcoholism and Hepatitis CPatient managed with diuretics and lactulose, assessed for transplant suitabilityFollow-up with hepatology team in 1 weekAvoidance of alcohol, low salt dietEnd-stage liver diseaseStable at discharge, with close outpatient monitoringPrescribed diuretics, lactulose, and multivitamins.HighModerate to HighSeverity based on condition ‘Liver failure’. Readmission probability based on discharge condition ‘Stable at discharge, with close outpatient monitoring’ and medical history of alcoholism and Hepatitis C.
Anna Taylor89Female5/9/20215/16/2021Dr. SimmonsDr. MitchellHip fracture after fallHip pinning surgeryGradual improvement with physical therapyOsteoporosis, past history of fallsSurgery was successful with no complications, physiotherapy started postoperativelyOrtho follow-up after 2 weeksPhysical therapy, fall precautions at homeFemoral neck fractureStable with improving mobilityAnalgesics and Calcium and Vitamin D supplements.ModerateLowSeverity based on condition ‘Hip fracture’. Readmission probability based on discharge condition ‘Stable with improving mobility’ and medical history of osteoporosis.
Michael Anderson72Male6/19/20216/28/2021Dr. YoungDr. HernandezProstate cancerProstatectomyWell tolerated procedure with good recoveryPast history of asthmaSurgery completed successfully and patient made steady progress in recoveryUrology follow-up after 1 monthMedication compliance, report any urinary difficultiesProstate adenocarcinomaStable at dischargePrescribed painkillers and inhaled corticosteroids. HighLowSeverity based on condition ‘Prostate cancer’. Readmission probability based on discharge condition ‘Stable at discharge’ and past history of asthma.
Patricia Lee52Female8/1/20218/7/2021Dr. MorrisDr. HallBreast CancerLumpectomy and radiationGood recovery with no post-op complicationsFirst degree relative with breast cancerSurgery completed with clear margins, initiated on post-op radiationOncology follow-up in 1 weekHealthy diet, regular exercise, follow recommended screening guidelinesBreast Cancer, stage IIaStable at dischargePrescribe painkillers and anti-emetics. HighModerate to HighSeverity based on condition ‘Breast Cancer’. Readmission probability based on discharge condition ‘Stable at discharge’ and family history of breast cancer.
Jacob Martinez30Male11/5/202111/9/2021Dr. KingDr. GonzalezAcute pancreatitisFluid resuscitation and supportive careImproved significantly with treatmentHistory of gallstonesPatient received IV fluids and pain managementGI follow up in 2 weeksLow-fat diet, avoid alcohol, medication complianceAcute pancreatitisImproved, stable at dischargePrescribed pain medication and proton pump inhibitors.LowModerate to HighSeverity based on condition ‘Acute pancreatitis’. Readmission probability based on discharge condition ‘Improved, stable at discharge’ and medical history of gallstones.
Melissa Martin65Female9/19/202110/1/2021Dr. ThompsonDr. MooreType 2 Diabetes ComplicationsInsulin Therapy, Diabetic EducationPatient responded well to therapyLong-standing history of Type 2 DiabetesPatient was educated about the importance of regular blood sugar monitoring, diet and exerciseEndocrinology follow up in 1 monthRegular blood sugar monitoring, maintain balanced diet, regular exerciseUncontrolled Type 2 DiabetesStable at dischargeInsulin and oral hypoglycemic agents. HighModerate to HighSeverity based on condition ‘Type 2 Diabetes Complications’. Readmission probability based on discharge condition ‘Stable at discharge’ and medical history of Type 2 Diabetes.
Jason Jackson45Male5/22/20216/1/2021Dr. RobertsDr. LopezTraumatic Brain InjuryDebulking surgery, rehabilitationPatient showed gradual improvementNo remarkable past medical historyPatient underwent surgery and was transferred to rehabilitation post-stabilizationNeurosurgery follow-up in 1 weekOngoing rehabilitation, medication adherenceTraumatic Brain InjuryFair condition at dischargePrescribed anticonvulsants and analgesics. ModerateModerate to HighSeverity based on condition ‘Traumatic Brain Injury’. Readmission probability based on discharge condition ‘Fair condition at discharge’ and medical history.
Linda Ramos70Female12/10/202112/20/2021Dr. ReedDr. JenkinsChronic Obstructive Pulmonary Disease (COPD) exacerbationInhaler therapy, steroids, antibioticsPatient’s breathing improved significantlyHistory of smoking and COPDManaged with nebulizers, steroids and antibioticsPulmonary follow-up in 2 weeksSmoking cessation, use inhalers as instructedChronic Obstructive Pulmonary Disease, acute exacerbationStable at dischargePrescribed inhalers, steroids and antibiotics.HighModerate to HighSeverity based on condition ‘COPD exacerbation’. Readmission probability based on discharge condition ‘Stable at discharge’ and history of smoking and COPD.
Joshua White62Male7/7/20217/12/2021Dr. FosterDr. SimmonsHeart failure exacerbationDiuretics, ACE inhibitors, lifestyle modificationPatient’s condition improved and stabilizedHistory of hypertension and heart diseaseManaged with medications and patient education about lifestyle changesCardiology follow-up in 1 monthRegular exercise, low sodium diet, medication complianceCongestive Heart Failure, acute exacerbationStable at dischargePrescribed diuretics, ACE inhibitors and beta blockers. HighModerate to HighSeverity based on condition ‘Heart failure exacerbation’. Readmission probability based on discharge condition ‘Stable at discharge’ and history of hypertension and heart disease.
Emma Bailey88Female9/15/202110/1/2021Dr. RussellDr. WatsonAlzheimer’s disease, behavioral changesAdjustment of medications, behavioral therapyGradual improvement in sleep pattern and agitationLong-standing Alzheimer’s diseasePatient was managed with adjustment of Alzheimer’s medications and behavioral techniquesNeurology follow-up in 1 monthRoutine, structured day, family supportAlzheimer’s disease with behavioral complicationsStable at dischargePrescribed Donepezil, antipsychotics and sleep aids.ModerateLowSeverity based on condition ‘Alzheimer’s disease, behavioral changes’. Readmission probability based on discharge condition ‘Stable at discharge’ and long-standing Alzheimer’s disease.
Michael Cox77Male12/20/202112/30/2021Dr. RogersDr. BennettComplications of Chronic Kidney DiseaseDialysis, nutritional counselingPatient’s renal function improved significantlyHistory of Chronic Kidney Disease and HypertensionManaged with dialysis and medicationsNephrology follow-up in 2 weeksLow sodium, low potassium diet, medication complianceChronic Kidney Disease, stage VStable at dischargePrescribed blood pressure medications, phosphate binders and erythropoietin.HighModerate to HighSeverity based on condition ‘Complications of Chronic Kidney Disease’. Readmission probability based on discharge condition ‘Stable at discharge’ and history of Chronic Kidney Disease and Hypertension.
Sarah Walker64Female7/25/20218/5/2021Dr. RichardsonDr. HughesGastritisAntacid administration, dietary changesPatient experienced reduction of symptomsHistory of gastritis and GERDManaged with antacids and dietary changesFollow-up appointment with gastroenterologist in 3 weeksAvoid spicy food, medication complianceAcute gastritisStable at dischargePrescribed Proton-pump inhibitors.LowLowSeverity based on condition ‘Gastritis’. Readmission probability based on discharge condition ‘Stable at discharge’ and history of gastritis and GERD.
Christopher Cooper85Male8/1/20218/10/2021Dr. RamirezDr. HillRheumatoid Arthritis painPain medication adjustment, physical therapyPatient’s mobility improved and pain reducedHistory of Rheumatoid ArthritisPain management approach adjusted, PT introducedFollow-up with Rheumatologist in 2 weeksPhysical therapy exercises, medication complianceRheumatoid arthritis with acute flareStable at dischargePrescribed NSAIDs, steroids, DMARDs.ModerateLowSeverity based on condition ‘Rheumatoid Arthritis pain’. Readmission probability based on discharge condition ‘Stable at discharge’ and history of Rheumatoid Arthritis.
Amanda Bell59Female11/15/202111/25/2021Dr. GrahamDr. MeyerDepressionCognitive Behavioral Therapy, medication adjustmentPatient’s mood improved with treatmentHistory of Major Depressive DisorderTreatment included medication adjustment and therapyPsychiatry follow-up in 1 weekMaintenance of therapy schedule, medication complianceMajor Depressive Disorder, recurrent, moderateStable at dischargePrescribed SSRIs and benzodiazepines.ModerateModerate to HighSeverity based on condition ‘Depression’. Readmission probability based on discharge condition ‘Stable at discharge’ and history of Major Depressive Disorder.
Anthony Reyes73Male9/20/202110/1/2021Dr. JenkinsDr. GordonSevere HypertensionIncrease in antihypertensives, lifestyle modificationsPatient’s blood pressure reduced and stabilizedLong-standing history of HypertensionManaged with an increase in hypertension medication and lifestyle modificationsCardiology follow-up in 2 weeksRegular exercise, weight loss, low sodium diet, medication complianceExtremely high blood pressureStable at dischargePrescribed ACE inhibitors, Diuretics.ModerateLowSeverity based on condition ‘Severe Hypertension’. Readmission probability based on discharge condition ‘Stable at discharge’ and long-standing history of Hypertension.
Olivia Ward32Female10/21/202110/30/2021Dr. ColeDr. CookPregnancy with hypertensionBed rest, blood pressure medicationsBlood pressure controlled with no distress to fetusNo significant past medical historyManaged with bed rest and blood pressure medications, and regular monitoring of fetusObstetrics follow-up in 1 weekBed rest, medication compliance, regular antenatal checksGestational HypertensionStable at dischargePrescribed labetalol.ModerateLowSeverity based on condition ‘Pregnancy with hypertension’. Readmission probability based on discharge condition ‘Stable at discharge’ and no significant past medical history.
William Howard56Male6/12/20216/23/2021Dr. BaylorDr. BlackPneumoniaAntibiotics, respiratory therapyPatient’s condition improved significantlyHistory of COPDTreated with IV antibiotics and oxygen therapyPulmonary follow-up in 1 monthComplete antibiotic course, smoking cessation adviceFinal diagnosis of community-acquired pneumoniaStable at dischargePrescribed oral antibiotics and inhalers.ModerateModerateSeverity based on condition ‘Pneumonia’. Readmission probability based on discharge condition ‘Stable at discharge’ and history of COPD.
Ava Davis43Female8/5/20218/15/2021Dr. CraigDr. HoustonAsthma exacerbationBronchodilators, steroids, inhaler technique reviewImprovement in asthma controlLong-standing asthmaTreated with bronchodilators and steroids, inhaler technique revisedPulmonary follow-up in 2 weeksAvoid triggers, use inhaler as instructedAsthma exacerbationStable at dischargePrescribed inhalers and oral steroids.LowModerateSeverity based on condition ‘Asthma exacerbation’. Readmission probability based on discharge condition ‘Stable at discharge’ and long-standing asthma.
Benjamin Turner66Male7/14/20217/24/2021Dr. FosterDr. ReedDiabetic foot ulcerWound care, blood sugar control, antibioticsSlow healing but progress with woundHistory of type 2 diabetes, peripheral neuropathyManaged with wound care, foot off-loading, and blood sugar controlEndocrinology follow-up in 1 monthFoot care, blood sugar control, follow up checkDiabetic foot ulcerStable at dischargePrescribed insulin, oral hypoglycemic, topical and oral antibiotics.LowModerateSeverity based on condition ‘Diabetic foot ulcer’. Readmission probability based on discharge condition ‘Stable at discharge’ and history of type 2 diabetes, peripheral neuropathy.
Charlotte Simmons31Female3/22/20214/1/2021Dr. ThompsonDr. JohnsonEctopic PregnancyLaparoscopic surgerySafe recovery post-surgeryPrior ectopic pregnancyEctopic pregnancy removal via laparoscopic approachOb-Gyn follow-up in 2 weeksRest, avoid lifting heavy weights, medication complianceFinal diagnosis of ectopic pregnancyRapid recovery at dischargePrescribed painkillers and oral contraceptives.LowModerateSeverity based on condition ‘Ectopic Pregnancy’. Readmission probability based on discharge condition ‘Rapid recovery at discharge’ and prior ectopic pregnancy.
Daniel Rodriguez58Male6/15/20216/26/2021Dr. BrooksDr. DavisCoronary artery diseaseAngioplasty and stent placementSignificant improvement post-procedureHistory of smoking and hypertensionProcedure successful with no complications, smoking cessation advice givenCardiology follow-up in 1 monthSmoking cessation, regular exercise, medication complianceFinal diagnosis of coronary artery diseaseStable at dischargePrescribed antiplatelets, beta-blockers, ACE inhibitors, statins.LowModerateSeverity based on condition ‘Coronary artery disease’. Readmission probability based on discharge condition ‘Stable at discharge’ and history of smoking and hypertension.
Lily Morris76Female7/30/20218/9/2021Dr. CarterDr. CollinsUrinary tract infectionAntibiotics and hydrationResolved with treatmentHistory of recurring UTIsTreated with antibiotics, urinary culture guided treatmentUrology follow-up in 3 weeksHydration, wipe front to back, medication complianceFinal diagnosis of urinary tract infectionResolved at dischargePrescribed oral antibiotics.LowModerateSeverity based on condition ‘Urinary tract infection’. Readmission probability based on discharge condition ‘Resolved at discharge’ and history of recurring UTIs.
Noah Taylor69Male6/23/20217/1/2021Dr. HowardDr. BennettPulmonary embolismAnticoagulation therapySymptoms improved with treatmentPast history of deep vein thrombosisIV anticoagulation followed by oral therapy to maintain INRHematology follow-up in 1 weekAvoid activities that can lead to falls, medication complianceFinal diagnosis of pulmonary embolismStable at dischargePrescribed oral anticoagulants.LowModerateSeverity based on condition ‘Pulmonary embolism’. Readmission probability based on discharge condition ‘Stable at discharge’ and past history of deep vein thrombosis.
Zoe Parker54Female8/24/20218/31/2021Dr. MartinDr. MartinezCrohn’s disease flareSteroids, infliximab infusionsResponse to treatment with symptom resolutionEstablished Crohn’s diseaseManaged with IV corticosteroids and infliximab infusionsGastroenterology follow-up in 2 weeksAvoid triggers, medication compliance, hydratedCrohn’s disease acute flareStable at dischargePrescribed oral steroids, infliximab infusion appointments. LowModerateSeverity based on condition ‘Crohn’s disease flare’. Readmission probability based on discharge condition ‘Stable at discharge’ and established Crohn’s disease.
Ethan Miller61Male12/8/202112/18/2021Dr. AdamsDr. BarnesLung CancerChemotherapyTolerating chemotherapy with manageable side effectsNo significant past medical historyPatient initiated on chemotherapy regimenOncology follow-up in 1 weekAdequate hydration, medication complianceFinal diagnosis of lung cancerStable at dischargePrescribed anti-emetics and pain management regimen.LowLowSeverity based on condition ‘Lung Cancer’. Readmission probability based on discharge condition ‘Stable at discharge’ and no significant past medical history.
Emily Roberts67Female11/26/202112/10/2021Dr. JacksonDr. ThompsonAcute renal failureDialysisRenal function improved with dialysis, kidney function partially restoredPast history of hypertension and diabetesTreated with intermittent hemodialysis and managed blood pressure and glucoseNephrology follow-up in 1 weekLow sodium and potassium diet, medication complianceFinal diagnosis of acute renal failureImproved at dischargePrescribed antihypertensives, insulin, and dialysis prescription.LowModerateSeverity based on condition ‘Acute renal failure’. Readmission probability based on discharge condition ‘Improved at discharge’ and past history of hypertension and diabetes.
Joseph Garcia80Male10/5/202110/15/2021Dr. PhillipsDr. CampbellChronic heart failure exacerbationDiuretics, ACE inhibitors, Beta-blockersSymptoms improved with medication adjustmentLong-standing heart failure, prior myocardial infarctionManaged with increase in diuretic dose, blood pressure controlCardiology follow-up in 1 weekLow sodium diet, daily weight monitoring, medication complianceChronic heart failure exacerbationStable at dischargePrescribed diuretics, ACE inhibitors, beta-blockers.LowModerateSeverity based on condition ‘Chronic heart failure exacerbation’. Readmission probability based on discharge condition ‘Stable at discharge’ and long-standing heart failure, prior myocardial infarction.
Mia Wong28Female7/22/20217/31/2021Dr. EvansDr. RogersThyroiditisThyroid hormone replacement therapyThyroid hormone levels returned to normalNo significant medical historyManaged with thyroid hormone replacement therapyEndocrinology follow-up in 1 monthMedication complianceFinal diagnosis of subacute thyroiditisStable at dischargeLevothyroxine.LowModerateSeverity based on condition ‘Thyroiditis’. Readmission probability based on discharge condition ‘Stable at discharge’ and no significant medical history.
Isaac Perry46Male9/25/202110/1/2021Dr. RossDr. GriffinCellulitisIV antibiotics followed by oral antibioticsInfection resolved with treatmentNo significant medical historyPatient treated with IV then oral antibioticsFollow-up with primary care in 1 weekComplete antibiotic course, local wound careFinal diagnosis of cellulitisResolved at dischargeOral antibiotics.LowModerateSeverity based on condition ‘Cellulitis’. Readmission probability based on discharge condition ‘Resolved at discharge’ and no significant medical history.
Sophia Lewis75Female8/2/20218/11/2021Dr. KennedyDr. DunnCongestive heart failure exacerbationDiuretics, dietary adjustmentsSymptoms improved with treatmentHistory of coronary artery diseaseManaged with medication optimization and dietary adviceCardiology follow-up in 2 weeksLow sodium diet, medication adherenceCongestive Heart Failure ExacerbationStable at dischargePrescribed loop diuretics, ACE inhibitors, and beta blockers. LowModerateSeverity based on condition ‘Congestive heart failure exacerbation’. Readmission probability based on discharge condition ‘Stable at discharge’ and history of coronary artery disease.
Grace Foster61Female4/23/20214/30/2021Dr. ReedDr. KlineChronic Kidney DiseaseDialysisStable under dialysis treatmentHistory of diabetes and hypertensionUnderwent dialysis and optimized blood pressure controlNephrology follow-up in 1 weekLow sodium diet, medication complianceChronic Kidney Disease Stage 5Stable at dischargePrescribed antihypertensive, erythropoiesis-stimulating agents.LowModerateSeverity based on condition ‘Chronic Kidney Disease’. Readmission probability based on discharge condition ‘Stable at discharge’ and history of diabetes and hypertension.
Noah Butler65Male10/1/202110/12/2021Dr. WellsDr. PerezCOPD ExacerbationCorticosteroids, bronchodilatorsBreathing improved noticeablyLong-standing COPD, ex-smokerManaged with nebulized bronchodilators and systemic corticosteroidsPulmonary follow-up in 1 monthSmoking cessation, use inhalers as instructedAcute COPD exacerbationStable at dischargePrescribed inhalers and a short course of oral steroids.LowModerateSeverity based on condition ‘COPD Exacerbation’. Readmission probability based on discharge condition ‘Stable at discharge’ and long-standing COPD, ex-smoker.
Eleanor Barnes50Female9/10/20219/16/2021Dr. StevensDr. RiveraRheumatoid Arthritis FlareSteroids and NSAIDsPain and swelling reduced significantlyLong-standing Rheumatoid ArthritisManaged with increase in steroids and NSAIDsRheumatology follow-up in 2 weeksGentle exercise, joint care, medication complianceAcute Rheumatoid Arthritis flareStable at dischargePrescribed steroids and NSAIDs.LowModerateSeverity based on condition ‘Rheumatoid Arthritis Flare’. Readmission probability based on discharge condition ‘Stable at discharge’ and long-standing Rheumatoid Arthritis.
Lucas Peterson78Male3/26/20214/1/2021Dr. McDonaldDr. BakerGouty ArthritisColchicine, AllopurinolGout attack settled, and uric acid loweredHistory of recurrent Gout attacksManaged with acute gout treatment and urate-lowering therapyFollow-up with Rheumatologist in 2 weeksLow purine diet, avoid alcohol, medication complianceFinal diagnosis of Gouty ArthritisStable at dischargePrescribed colchicine and allopurinol.LowModerateSeverity based on condition ‘Gouty Arthritis’. Readmission probability based on discharge condition ‘Stable at discharge’ and history of recurrent Gout attacks.
Sophie Duncan23Female5/30/20216/2/2021Dr. BryantDr. ColemanAcute appendicitisLaparoscopic appendectomyExcellent recovery with no complicationsPreviously healthySuccessfully underwent laparoscopic appendectomyGeneral Surgery follow-up in 2 weeksCare of operative site, resume regular activity as toleratedAcute appendicitisStable at dischargeAnalgesics, wound care recommendations.LowModerateSeverity based on condition ‘Acute appendicitis’. Readmission probability based on discharge condition ‘Stable at discharge’ and being previously healthy.
Samuel Larson71Male12/15/202112/22/2021Dr. FosterDr. CraigPneumoniaAntibiotics, respiratory supportResponse to antibiotics with improved breathingHistory of COPDReceived IV antibiotics and supplemental oxygenFollow-up with Pulmonologist in 4 weeksTake medications as prescribed, rest and adequate nutritionPneumoniaStable at dischargeOral antibiotics for completing course.ModerateModerateSeverity based on condition ‘Pneumonia’. Readmission probability based on discharge condition ‘Stable at discharge’ and history of COPD.
Sarah Woods58Female11/11/202111/17/2021Dr. RomeroDr. JacobsBreast CancerLumpectomy and sentinal lymph node biopsyNo complications with satisfactory recoveryNo significant historyProcedure went without any complications, pathological report awaitedFollow-up with Oncologist in 1 weekIncision care, avoid physical exertionBreast CancerStable at dischargePain management medications.LowModerateSeverity based on condition ‘Breast Cancer’. Readmission probability based on discharge condition ‘Stable at discharge’ and no significant past medical history.
Jack Hudson46Male10/24/202110/31/2021Dr. PaulDr. BakerGastric ulcersProton pump inhibitors, dietary modificationsSymptoms improved significantly with treatmentHistory of NSAID useManaged with PPI therapy and dietary adviceGastroenterology follow-up in 1 monthAvoid spicy food, alcohol, smoking, medication adherenceGastric ulcerStable at dischargeOmeprazole, Sucralfate.LowModerateSeverity based on condition ‘Gastric ulcers’. Readmission probability based on discharge condition ‘Stable at discharge’ and history of NSAID use.
Ivy Johnson80Female9/3/20219/15/2021Dr. JacksonDr. RileyStroke rehabilitationPhysical and occupational therapyGradual improvement with still residual weaknessPast history of hypertension and diabetesUnderwent intensive rehabilitation therapyFollow-up with Outpatient Rehab and Neurologist in 4 weeksPhysiotherapy, medication complianceStroke with right hemiparesisStable at dischargeAntihypertensives, oral antidiabetics, aspirin.LowModerateSeverity based on condition ‘Stroke rehabilitation’. Readmission probability based on discharge condition ‘Stable at discharge’ and history of hypertension and diabetes.
Elijah Myers55Male11/3/202111/10/2021Dr. AyersDr. HarlowPancreatitisIV fluids, pain management, and dietary adjustmentsSymptoms improved significantlyHistory of alcohol abuseManaged with IV fluids, pain management, and alcohol detoxGastroenterology and Addiction specialist follow-up in 1 weekTotal abstinence from alcohol, low-fat diet, medication complianceAlcohol-induced pancreatitisImproved at dischargePrescribed pain killers, pancreatic enzymes, and detox medications.LowModerateSeverity based on condition ‘Pancreatitis’. Readmission probability based on discharge condition ‘Improved at discharge’ and history of alcohol abuse.
Hannah Peters36Female10/11/202110/20/2021Dr. MadisonDr. TurnerUncontrolled Type 1 DiabetesInsulin regulation, diet and lifestyle changesBlood sugar levels returned to normalLong-standing diabetesManagement involved adjustment of insulin dose and dietary adviceEndocrinology follow-up in 1 weekRegular monitoring, maintain balanced diet, regular exerciseUncontrolled Type 1 DiabetesStable at dischargeInsulin as per optimized prescription.ModerateModerateSeverity based on condition ‘Uncontrolled Type 1 Diabetes’. Readmission probability based on discharge condition ‘Stable at discharge’ and long-standing diabetes.
William Riley72Male7/22/20217/29/2021Dr. HowardDr. JenkinsChronic Obstructive Pulmonary disease exacerbationOxygen therapy, steroids, and antibioticsBreathing normalized, chest clearingHistory of smoking and COPDManaged with nebulizers, steroids, and antibioticsPulmonary follow-up in 2 weeksSmoking cessation, use inhalers as instructedCOPD exacerbationStable at dischargeInhalers, steroids, and antibiotics.LowModerateSeverity based on condition ‘Chronic Obstructive Pulmonary disease exacerbation’. Readmission probability based on discharge condition ‘Stable at discharge’ and history of smoking and COPD.
Lucy Foster46Female9/15/20219/30/2021Dr. ReeseDr. CastilloBreast CancerChemotherapyModerate side effects managedNo significant family historyCommencement of chemotherapy regimenOncology follow-up in 1 weekHealthy diet, gentle exercise, medication complianceBreast Cancer, stage IIbStable at dischargePrescribed antiemetic and analgesic.LowModerateSeverity based on condition ‘Breast Cancer’. Readmission probability based on discharge condition ‘Stable at discharge’ and no significant family history.
Oliver Shaw35Male4/27/20215/2/2021Dr. PiperDr. ShawFracture tibiaOpen reduction and internal fixationRecovery as expected, mobilizing with supportNo significant medical historySmooth surgery, recovery in ward until independent mobilization achievedOrthopedic follow-up in 1 weekWeight-bearing as per advice, rest, elevate limbTibia fractureStable at dischargeAnalgesics, anticoagulant.LowModerateSeverity based on condition ‘Fracture tibia’. Readmission probability based on discharge condition ‘Stable at discharge’ and no significant medical history.
Stella Rogers55Female8/21/20218/31/2021Dr. SparksDr. KennedyVasculitisSteroids and immunosuppressantsSymptoms improved significantlyNo significant medical historyManaged with steroids and immunosuppressantsRheumatology follow-up in 2 weeksMedication compliance, regular follow ups, report any new symptomsVasculitisStable at dischargeCorticosteroids, immunosuppressants.LowModerateSeverity based on condition ‘Vasculitis’. Readmission probability based on discharge condition ‘Stable at discharge’ and no significant medical history.
Liam Griffin81Male7/25/20218/8/2021Dr. PattersonDr. PhillipsPneumoniaAntibiotics and supportive careCondition improved significantlyHistory of diabetes, hypertensionTreated with IV antibiotics and oxygen therapyPulmonology follow-up in 3 weeksMedication compliance, smoking cessationPneumoniaStable at dischargeOral antibiotics to complete course.ModerateModerateSeverity based on condition ‘Pneumonia’. Readmission probability based on discharge condition ‘Stable at discharge’ and history of diabetes, hypertension.
Hazel Ortiz45Female11/10/202111/18/2021Dr. SnyderDr. HamiltonSevere AnemiaBlood transfusion, iron supplementsBlood levels normalizedHistory of heavy menstrual bleedingFluid resuscitation and blood transfusions were givenGynecology follow-up in 1 weekOral iron supplements, balanced dietSevere Iron Deficiency AnemiaStable at dischargeIron supplement, analgesic.LowModerateSeverity based on condition ‘Severe Anemia’. Readmission probability based on discharge condition ‘Stable at discharge’ and history of heavy menstrual bleeding.
Levi Cooper63Male1/15/20211/20/2021Dr. BowmanDr. FrancisGastrointestinal bleedingEndoscopy, Clipping of bleeding ulcerBleeding stopped, stable conditionHistory of chronic NSAID useEndoscopic intervention was successful without complicationsGastroenterology follow-up in 1 weekAvoid NSAIDs and alcohol, medication compliancePeptic Ulcer Disease with bleedingStable at dischargeProton pump inhibitors.LowModerateSeverity based on condition ‘Gastrointestinal bleeding’. Readmission probability based on discharge condition ‘Stable at discharge’ and history of chronic NSAID use.
Lily Rogers78Female6/8/20216/15/2021Dr. DeanDr. FosterChronic Kidney Disease progressionDialysis initiationStable after starting dialysisHistory of diabetes, Chronic Kidney DiseaseInitiated on dialysisNephrology follow-up in 1 weekMedication compliance, appropriate dietEnd-Stage Renal DiseaseStable at dischargeAntihypertensives, erythropoiesis-stimulating agents, phosphate binders.LowModerateSeverity based on condition ‘Chronic Kidney Disease progression’. Readmission probability based on discharge condition ‘Stable at discharge’ and history of diabetes, Chronic Kidney Disease.
Noah Barnes48Male8/1/20218/8/2021Dr. RamirezDr. HughesBell’s PalsyCorticosteroids, Physical therapySlow return of facial movementNo significant medical historyManaged with corticosteroids and physical therapyNeurology follow-up in 1 monthFacial muscle exercises, medication complianceBell’s PalsyImproving at dischargePrescribed corticosteroids, antivirals.LowModerateSeverity based on condition ‘Bell’s Palsy’. Readmission probability based on discharge condition ‘Improving at discharge’ and no significant medical history.
Emily Foster34Female1/22/20211/27/2021Dr. AdamsDr. BarnesAppendicitisAppendectomyExcellent recovery post-surgeryNo significant medical historyUnderwent routine open appendectomyFollow-up with surgeon in 2 weeksWound care, report any fever or wound dischargeAcute appendicitisStable at dischargePrescribed painkillers and absorption.LowModerateSeverity based on condition ‘Appendicitis’. Readmission probability based on discharge condition ‘Stable at discharge’ and no significant medical history.
Ethan Johnson45Male3/21/20214/5/2021 Dr. RobertsDr. EdwardsColon cancerResection of colon cancer, start of adjuvant chemotherapyDisease under control, tolerated chemo wellNo significant past medical historyComplete tumor resection achieved with histology confirming marginsOncologist follow-up in 2 weeksHealthy diet, regular exercise, medication complianceColon cancer stage IIIStable at dischargePrescribed chemotherapeutics, antiemetics.LowLowSeverity based on condition ‘Colon cancer’. Readmission probability based on discharge condition ‘Stable at discharge’ and no significant past medical history.
Sophia James24Female11/5/202111/15/2021Dr. JacobsDr. WillisSevere Asthma AttackIntravenous corticosteroids, nebulizer treatmentsBreathing eased, symptoms improvedLifetime AsthmaHospitalized for acute asthma managementPulmonary follow-up in 1 weekAvoid asthma triggers, regular use of control medicationAcute severe asthma attack, AsthmaStable at dischargeInhalers, oral corticosteroids for a short course.LowModerateSeverity based on condition ‘Severe Asthma Attack’. Readmission probability based on discharge condition ‘Stable at discharge’ and lifetime asthma.
Jacob Owens58Male10/7/202110/14/2021Dr. GriffinDr. PattersonPeptic ulcer diseaseProton pump inhibitors, H. pylori eradicationSymptoms improved significantlyNo significant past medical historyReceived treatment for H. pylori and proton pump inhibitorsGastroenterology follow-up in 1 monthAvoid NSAIDs, alcohol, spicy foods; take medications with mealsPeptic ulcer diseaseStable at dischargePrescribed proton-pump inhibitors.LowLowSeverity based on condition ‘Peptic ulcer disease’. Readmission probability based on discharge condition ‘Stable at discharge’ and no significant past medical history.
Layla Tyler63Female8/16/20218/28/2021Dr. EllisDr. FosterCongestive Heart FailureDiuretics, vasodilators, beta-blockersSymptoms improved with stabilizationHypertensionAdjusted medication regimen; patient education about fluid intake and weight monitoringCardiology follow-up in 4 weeksMedication compliance, daily weight, low sodium dietCongestive Heart FailureStable at dischargePrescribed diuretics, vasodilators, beta-blockers.LowModerateSeverity based on condition ‘Congestive Heart Failure’. Readmission probability based on discharge condition ‘Stable at discharge’ and hypertension.
Max Peters46Male3/12/20213/18/2021Dr. KingDr. HowardPneumothoraxChest tube insertionChest re-expanded successfullyNo significant past medical historyUnderwent chest tube insertion for pneumothoraxPulmonary follow-up in 2 weeksAvoid heavy lifting, short flights for 2 weeksSpontaneous PneumothoraxStable at dischargeAnalgesics, follow up as directed.LowLowSeverity based on condition ‘Pneumothorax’. Readmission probability based on discharge condition ‘Stable at discharge’ and no significant past medical history.
Harper Davis71Female5/30/20216/6/2021Dr. RossDr. HollandCOPD ExacerbationBronchodilators, steroidsBreathing improved noticeablyCOPD, ex-smokerManaged with nebulized bronchodilators and oral steroidsPulmonary follow-up in 2 weeksSmoking cessation, use inhalers as instructedCOPD exacerbationStable at dischargeInhalers, oral steroid taper.LowModerateSeverity based on condition ‘COPD Exacerbation’. Readmission probability based on discharge condition ‘Stable at discharge’ and COPD, ex-smoker history.
Thomas Mitchell79Male5/10/20215/21/2021Dr. BarrettDr. OsborneHeart failureDiuretics, beta-blockers, ACE inhibitorsCondition improved significantly with managementHistory of ischemic heart disease, hypertensionManaged with heart failure medications, fluid restrictionCardiology follow-up in 2 weeksLow salt diet, fluid restriction, medication complianceCongestive heart failureStable at dischargeFurosemide, lisinopril, carvedilol.LowModerateSeverity based on condition ‘Heart failure’. Readmission probability based on discharge condition ‘Stable at discharge’ and history of ischemic heart disease, hypertension.
Emily Ross43Female2/14/20212/21/2021Dr. HamiltonDr. JenkinsCholecystitisCholecystectomyRecovery without complicationsNo significant past medical historyUnderwent laparoscopic cholecystectomySurgery follow-up in 2 weeksGradual increase in diet, wound careCholecystitisStable at dischargeAnalgesics, wound care recommendations.LowLowSeverity based on condition ‘Cholecystitis’. Readmission probability based on discharge condition ‘Stable at discharge’ and no significant past medical history.
Oliver Hall27Male7/18/20217/22/2021Dr. WashingtonDr. MurrayMeningitisAntibiotics, steroidsSymptoms resolved notablyNo significant past medical historyManaged with IV antibiotics and supportive careNeurology follow-up in 2 weeksRest, hydration, antibiotic complianceMeningitisStable at dischargeContinuation of oral antibiotics and analgesics.LowLowSeverity based on condition ‘Meningitis’. Readmission probability based on discharge condition ‘Stable at discharge’ and no significant past medical history.
Abigail Jackson65Female12/1/202112/10/2021Dr. JenkinsDr. ThompsonStrokeThrombolytic therapy, rehabilitationPartial resolution of deficitsHypertension, diabetesUnderwent IV thrombolysis and rehabilitationNeurology and rehabilitation follow-up in 1 monthPhysiotherapy, medication compliance, lifestyle modificationsIschemic strokeModerate impairment at dischargeAntihypertensives, antidiabetics, anticoagulation.LowModerateSeverity based on condition ‘Stroke’. Readmission probability based on moderate impairment at discharge and history of hypertension, diabetes.
Jackson Perez54Male6/30/20217/6/2021Dr. AdamsDr. CollinsPeptic Ulcer DiseaseProton pump inhibitors, H. pylori eradicationSymptoms markedly improvedPast history of smoking, alcohol useManaged with proton pump inhibitors and H. pylori eradication therapyGastroenterology follow-up in 4 weeksMedication compliance, lifestyle modification, stop alcohol and smokingPeptic Ulcer DiseaseStable at dischargeAntibiotics for H.pylori, PPIs.LowModerateSeverity based on condition ‘Peptic Ulcer Disease’. Readmission probability based on stable condition at discharge and past history of smoking, alcohol use.
Sophia Kline31Female2/2/20212/7/2021Dr. BaileyDr. BellPyelonephritisIV antibiotics followed by oral antibiotics therapySymptoms resolved significantlyNo significant past medical historyManaged with IV antibiotics followed by switch to oralPrimary care follow-up in 2 weeksHydration, avoid delaying urination, antibiotic compliancePyelonephritisStable at dischargeOral antibiotics to complete 14 days course.LowLowSeverity based on condition ‘Pyelonephritis’. Readmission probability based on stable condition at discharge and no significant past medical history.
Grayson Walker32Male3/18/20213/25/2021Dr. RodriguezDr. WebbAppendicitisAppendectomyExcellent recovery with no complicationsNo significant medical historyUnderwent appendectomy without complicationsSurgery follow-up in 2 weeksResume normal diet gradually, wound care, report any feverAppendicitisStable at dischargeAnalgesics.LowModerateSeverity based on condition ‘Appendicitis’. Readmission probability based on stable condition at discharge and no significant medical history.
Aria Harper73Female11/12/202111/30/2021Dr. SnyderDr. WalshHeart failureDiuretics, lifestyle modificationSymptoms improved notablyHistory of Hypertension, DiabetesManaged with diuretics and lifestyle modification adviceCardiology follow-up in 1 monthWeight monitoring, low salt diet, exercise, medication complianceCongestive Heart FailureStable at dischargePrescribed diuretics, ACE inhibitors, and beta-blockersLowModerateSeverity based on condition ‘Heart failure’. Readmission probability based on stable condition at discharge and history of Hypertension, Diabetes.

References

  1. Gilbert, A.; Marciniak, M.; Rodero, C.; Lamata, P.; Samset, E.; Mcleod, K. Generating Synthetic Labeled Data from Existing Anatomical Models: An Example with Echocardiography Segmentation. IEEE Trans. Med. Imaging 2021, 40, 2783–2794. [Google Scholar] [CrossRef] [PubMed]
  2. Aouedi, O.; Sacco, A.; Piamrat, K.; Marchetto, G. Handling Privacy-Sensitive Medical Data With Federated Learning: Challenges and Future Directions. IEEE J. Biomed. Health Inform. 2022, 27, 790–803. [Google Scholar] [CrossRef]
  3. Elbadawi, M.; Li, H.; Basit, A.W.; Gaisford, S. The role of artificial intelligence in generating original scientific research. Int. J. Pharm. 2024, 652, 123741. [Google Scholar] [CrossRef] [PubMed]
  4. Van Nooten, J.; Daelemans, W. Improving Dutch Vaccine Hesitancy Monitoring via Multi-Label Data Augmentation with GPT-3.5. In Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, Toronto, ON, Canada, 14 July 2023; Available online: https://openai.com/blog/chatgpt (accessed on 21 April 2024).
  5. Zhou, S.; Zhang, Y. DATLMedQA: A data augmentation and transfer learning based solution for medical question answering. Appl. Sci. 2021, 11, 11251. [Google Scholar] [CrossRef]
  6. Hämäläinen, P.; Tavast, M.; Kunnari, A. Evaluating Large Language Models in Generating Synthetic HCI Research Data: A Case Study. In Proceedings of the Conference on Human Factors in Computing Systems, Hamburg, Germany, 23–28 April 2023; Association for Computing Machinery: New York, NY, USA, 2023. [Google Scholar] [CrossRef]
  7. Lu, Q.; Dou, D.; Nguyen, T.H. Textual Data Augmentation for Patient Outcomes Prediction. In Proceedings of the 2021 IEEE international conference on bioinformatics and biomedicine (BIBM), Houston, TX, USA, 9–12 December 2021. [Google Scholar] [CrossRef]
  8. Bird, J.J.; Pritchard, M.G.; Fratini, A.; Ekart, A.; Faria, D.R. Synthetic Biological Signals Machine-Generated by GPT-2 Improve the Classification of EEG and EMG through Data Augmentation. IEEE Robot. Autom. Lett. 2021, 6, 3498–3504. [Google Scholar] [CrossRef]
  9. Amin-Nejad, A.; Ive, J.; Velupillai, S. Exploring Transformer Text Generation for Medical Dataset Augmentation. In Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France, 11–16 May 2020; Available online: https://github.com/tensorflow/tensor2tensor (accessed on 21 April 2024).
  10. Thamsen, B.; Yevtushenko, P.; Gundelwein, L.; Setio, A.A.A.; Lamecker, H.; Kelm, M.; Schafstedde, M.; Heimann, T.; Kuehne, T.; Goubergrits, L. Synthetic Database of Aortic Morphometry and Hemodynamics: Overcoming Medical Imaging Data Availability. IEEE Trans. Med. Imaging 2021, 40, 1438–1449. [Google Scholar] [CrossRef] [PubMed]
  11. Ruksakulpiwat, S.; Kumar, A.; Ajibade, A. Using ChatGPT in Medical Research: Current Status and Future Directions. J. Multidiscip. Health 2023, 16, 1513–1520. [Google Scholar] [CrossRef]
  12. Mahuli, S.A.; Rai, A.; Mahuli, A.V.; Kumar, A. Application ChatGPT in conducting systematic reviews and meta-analyses. Br. Dent. J. 2023, 235, 90–92. [Google Scholar] [CrossRef] [PubMed]
  13. Cai, X.; Geng, Y.; Du, Y.; Westerman, B.; Wang, D.; Ma, C.; Vallejo, J.J.G. Utilizing ChatGPT to select literature for meta-analysis shows workload reduction while maintaining a similar recall level as manual curation. medRxiv 2023. [Google Scholar] [CrossRef]
  14. De Angelis, L.; Baglivo, F.; Arzilli, G.; Privitera, G.P.; Ferragina, P.; Tozzi, A.E.; Rizzo, C. ChatGPT and the rise of large language models: The new AI-driven infodemic threat in public health. Front. Public Health 2023, 11, 1166120. [Google Scholar] [CrossRef]
  15. Reddy, S. Evaluating large language models for use in healthcare: A framework for translational value assessment. Inform. Med. Unlocked 2023, 41, 101304. [Google Scholar] [CrossRef]
  16. Alberts, I.L.; Mercolli, L.; Pyka, T.; Prenosil, G.; Shi, K.; Rominger, A.; Afshar-Oromieh, A. Large language models (LLM) and ChatGPT: What will the impact on nuclear medicine be? Eur. J. Nucl. Med. 2023, 50, 1549–1552. [Google Scholar] [CrossRef] [PubMed]
  17. Chatterjee, S.; Bhattacharya, M.; Pal, S.; Lee, S.; Chakraborty, C. ChatGPT and large language models in orthopedics: From education and surgery to research. J. Exp. Orthop. 2023, 10, 1–10. [Google Scholar] [CrossRef] [PubMed]
  18. Sallam, M. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare 2023, 11, 887. [Google Scholar] [CrossRef]
  19. Lim, S.; Schmälzle, R. Artificial intelligence for health message generation: An empirical study using a large language model (LLM) and prompt engineering. Front. Commun. 2023, 8, 1129082. [Google Scholar] [CrossRef]
  20. Waisberg, E.; Ong, J.; Kamran, S.A.; Masalkhi, M.; Zaman, N.; Sarker, P.; Lee, A.G.; Tavakkoli, A. Bridging artificial intelligence in medicine with generative pre-trained transformer (GPT) technology. J. Med. Artif. Intell. 2023, 6, 13. [Google Scholar] [CrossRef]
  21. Maddigan, P.; Susnjak, T. Chat2VIS: Generating Data Visualizations via Natural Language Using ChatGPT, Codex and GPT-3 Large Language Models. IEEE Access 2023, 11, 45181–45193. [Google Scholar] [CrossRef]
  22. Lengerich, B.J.; Bordt, S.; Nori, H.; Nunnally, M.E.; Aphinyanaphongs, Y.; Kellis, M.; Caruana, R. LLMs Understand Glass-Box Models, Discover Surprises, and Suggest Repairs. arXiv 2023, arXiv:2308.01157. [Google Scholar]
  23. Sharma, A.; Devalia, D.; Almeida, W.; Patil, H.; Mishra, A. Statistical Data Analysis using GPT3: An Overview. In Proceedings of the 2022 IEEE Bombay Section Signature Conference (IBSSC), Mumbai, India, 8–10 December 2022; IEEE: Piscataway, NJ, USA, 2022. [Google Scholar]
  24. Espejel, J.L.; Ettifouri, E.H.; Alassan, M.S.Y.; Chouham, E.M.; Dahhane, W. GPT-3.5, GPT-4, or BARD? Evaluating LLMs reasoning ability in zero-shot setting and performance boosting through prompts. Nat. Lang. Process. J. 2023, 5, 100032. [Google Scholar] [CrossRef]
  25. de Kok, T. Generative LLMs and Textual Analysis in Accounting: (Chat)GPT as Research Assistant? 2023. Available online: https://ssrn.com/abstract=4429658 (accessed on 21 April 2024).
  26. Yenduri, G.; Srivastava, G.; Maddikunta, P.K.R.; Jhaveri, R.H.; Wang, W.; Vasilakos, A.V.; Gadekallu, T.R. Generative Pre-trained Transformer: A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future Directions. arXiv 2023, arXiv:2305.10435. [Google Scholar] [CrossRef]
  27. Sufi, F.K.; Alsulami, M.; Gutub, A. Automating Global Threat-Maps Generation via Advancements of News Sensors and AI. Arab. J. Sci. Eng. 2022, 48, 2455–2472. [Google Scholar] [CrossRef]
  28. Sufi, F. Social Media Analytics on Russia–Ukraine Cyber War with Natural Language Processing: Perspectives and Challenges. Information 2023, 14, 485. [Google Scholar] [CrossRef]
  29. Sufi, F.K.; Razzak, I.; Khalil, I. Tracking Anti-Vax Social Movement Using AI-Based Social Media Monitoring. IEEE Trans. Technol. Soc. 2022, 3, 290–299. [Google Scholar] [CrossRef]
  30. Sufi, F.K.; Khalil, I. Automated Disaster Monitoring From Social Media Posts Using AI-Based Location Intelligence and Sentiment Analysis. IEEE Trans. Comput. Soc. Syst. 2022. [Google Scholar] [CrossRef]
  31. Sufi, F.K. AI-SocialDisaster: An AI-based software for identifying and analyzing natural disasters from social media. Softw. Impacts 2022, 13, 100319. [Google Scholar] [CrossRef]
  32. Sufi, F.K. A decision support system for extracting artificial intelligence-driven insights from live twitter feeds on natural disasters. Decis. Anal. J. 2022, 5, 100130. [Google Scholar] [CrossRef]
  33. Sufi, F.K.; Alsulami, M. Automated Multidimensional Analysis of Global Events with Entity Detection, Sentiment Analysis and Anomaly Detection. IEEE Access 2021, 9, 152449–152460. [Google Scholar] [CrossRef]
  34. Sufi, F. Algorithms in Low-Code-No-Code for Research Applications: A Practical Review. Algorithms 2023, 16, 108. [Google Scholar] [CrossRef]
  35. Balaji, S.; Magar, R.; Jadhav, Y.; Farimani, A.B. GPT-MolBERTa: GPT Molecular Features Language Model for molecular property prediction. arXiv 2023, arXiv:2310.03030. [Google Scholar]
  36. Hu, Y.; Mai, G.; Cundy, C.; Choi, K.; Lao, N.; Liu, W.; Lakhanpal, G.; Zhou, R.Z.; Joseph, K. Geo-knowledge-guided GPT models improve the extraction of location descriptions from disaster-related social media messages. Int. J. Geogr. Inf. Sci. 2023, 37, 2289–2318. [Google Scholar] [CrossRef]
  37. Maimaiti, M.; Liu, Y.; Luan, H.; Sun, M. Data augmentation for low-resource languages NMT guided by constrained sampling. Int. J. Intell. Syst. 2021, 37, 30–51. [Google Scholar] [CrossRef]
  38. Suhaeni, C.; Yong, H.-S. Mitigating Class Imbalance in Sentiment Analysis through GPT-3-Generated Synthetic Sentences. Appl. Sci. 2023, 13, 9766. [Google Scholar] [CrossRef]
  39. Romero-Sandoval, M.; Calderón-Ramírez, S.; Solís, M. Using GPT-3 as a Text Data Augmentator for a Complex Text Detector. In Proceedings of the 2023 IEEE 5th International Conference on BioInspired Processing (BIP), San Carlos, Alajuela, Costa Rica, 28–30 November 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar] [CrossRef]
  40. Cohen, S.; Presil, D.; Katz, O.; Arbili, O.; Messica, S.; Rokach, L. Enhancing social network hate detection using back translation and GPT-3 augmentations during training and test-time. Inf. Fusion 2023, 99, 101887. [Google Scholar] [CrossRef]
  41. Rebboud, Y.; Lisena, P.; Troncy, R. Prompt-based Data Augmentation for Semantically-Precise Event Relation Classification. In Proceedings of the 2023 IEEE 5th International Conference on BioInspired Processing (BIP), San Carlos, Alajuela, Costa Rica, 28–30 November 2023; Available online: http://ceur-ws.org (accessed on 21 April 2024).
  42. Grasler, I.; Preus, D.; Brandt, L.; Mohr, M. Efficient Extraction of Technical Requirements Applying Data Augmentation. In Proceedings of the ISSE 2022–2022 8th IEEE International Symposium on Systems Engineering, Vienna, Austria, 24–26 October 2022; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2022. [Google Scholar] [CrossRef]
  43. Singh, C.; Askari, A.; Caruana, R.; Gao, J. Augmenting interpretable models with large language models during training. Nat. Commun. 2023, 14, 7913. [Google Scholar] [CrossRef]
  44. Modzelewski, A.; Sosnowski, W.; Wilczynska, M.; Wierzbicki, A. DSHacker at SemEval-2023 Task 3: Genres and Persuasion Techniques Detection with Multilingual Data Augmentation through Machine Translation and Text Generation. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), Toronto, ON, Canada, 13–14 July 2023; Available online: https://semeval.github.io/SemEval2023/ (accessed on 21 April 2024).
  45. Hong, X.-S.; Wu, S.-H.; Tian, M.; Jiang, J. CYUT at the NTCIR-16 FinNum-3 Task: Data Resampling and Data Augmentation by Generation. In Proceedings of the 16th NTCIR Conference on Evaluation of Information Access Technologies, Tokyo, Japan, 14–17 June 2022; Available online: https://huggingface.co/docs/transformers/main (accessed on 21 April 2024).
  46. Khatri, S.; Iqbal, M.; Ubakanma, G.; van der Vliet-Firth, S. SkillBot: Towards Data Augmentation using Transformer language model and linguistic evaluation. In Proceedings of the 2022 International Conference on Human-Centered Cognitive Systems, HCCS 2022, Shanghai, China, 17–18 December 2022; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2022. [Google Scholar] [CrossRef]
  47. Vogel, L.; Flek, L. Investigating Paraphrasing-Based Data Augmentation for Task-Oriented Dialogue Systems. In Lecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer Science and Business Media Deutschland GmbH: Berlin/Heidelberg, Germany, 2022; pp. 476–488. [Google Scholar] [CrossRef]
  48. Casula, C.; Tonelli, S.; Kessler, F.B. Generation-Based Data Augmentation for Offensive Language Detection: Is It Worth It? In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, Dubrovnik, Croatia, 2–6 May 2023; Available online: https://github.com/dhfbk/annotators-agreement-dataset (accessed on 21 April 2024).
  49. Pouran, A.; Veyseh, B.; Dernoncourt, F.; Min, B.; Nguyen, T.H. Generating Complement Data for Aspect Term Extraction with GPT-2. In Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing, Virtual, 14 July 2022. [Google Scholar]
  50. D’Sa, A.G.; Illina, I.; Fohr, D.; Klakow, D.; Ruiter, D. Exploring Conditional Language Model Based Data Augmentation Approaches for Hate Speech Classification. In Lecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer Science and Business Media Deutschland GmbH: Berlin/Heidelberg, Germany, 2021; pp. 135–146. [Google Scholar] [CrossRef]
  51. Meyer, S.; Elsweiler, D.; Ludwig, B.; Fernandez-Pichel, M.; Losada, D.E. Do We Still Need Human Assessors’ Prompt-Based GPT-3 User Simulation in Conversational AI. In Proceedings of the 4th Conference on Conversational User Interfaces, Glasgow, UK, 26–28 July 2022; ACM International Conference Proceeding Series; Association for Computing Machinery: New York, NY, USA, 2022. [Google Scholar] [CrossRef]
  52. Queiroz Abonizio, H.; Barbon Junior, S. Pre-trained Data Augmentation for Text Classification. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer Science and Business Media Deutschland GmbH: Berlin/Heidelberg, Germany, 2020; pp. 551–565. [Google Scholar] [CrossRef]
  53. Tapia-Téllez, J.M.; Escalante, H.J. Data Augmentation with Transformers for Text Classification. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer Science and Business Media Deutschland GmbH: Berlin/Heidelberg, Germany, 2020; pp. 247–259. [Google Scholar] [CrossRef]
  54. Hassani, H.; Silva, E.S. The Role of ChatGPT in Data Science: How AI-Assisted Conversational Interfaces Are Revolutionizing the Field. Big Data Cogn. Comput. 2023, 7, 62. [Google Scholar] [CrossRef]
  55. Nouri, N. Data Augmentation with Dual Training for Offensive Span Detection. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, DC, USA, July 2022. [Google Scholar]
  56. Bayer, M.; Kaufhold, M.-A.; Buchhold, B.; Keller, M.; Dallmeyer, J.; Reuter, C. Data augmentation in natural language processing: A novel text generation approach for long and short text classifiers. Int. J. Mach. Learn. Cybern. 2022, 14, 135–150. [Google Scholar] [CrossRef] [PubMed]
  57. Anaby-Tavor, A.; Carmeli, B.; Goldbraich, E.; Kantor, A.; Kour, G.; Shlomov, S.; Tepper, N.; Zwerdling, N. Do Not Have Enough Data? Deep Learning to the Rescue! Proc. AAAI Conf. Artif. Intell. 2020, 34, 7383–7390. [Google Scholar] [CrossRef]
  58. Quteineh, H.; Samothrakis, S.; Sutcliffe, R. Textual Data Augmentation for Efficient Active Learning on Tiny Datasets. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; Available online: https://www.snorkel.org/ (accessed on 21 April 2024).
  59. Veyseh, A.P.B.; Van Nguyen, M.; Min, B.; Nguyen, T.H. Augmenting Open-Domain Event Detection with Synthetic Data from GPT-2. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer Science and Business Media Deutschland GmbH: Berlin/Heidelberg, Germany, 2021; pp. 644–660. [Google Scholar] [CrossRef]
  60. Sawai, R.; Paik, I.; Kuwana, A. Sentence augmentation for language translation using gpt-2. Electronics 2021, 10, 3082. [Google Scholar] [CrossRef]
  61. Pellicer, L.F.A.O.; Ferreira, T.M.; Costa, A.H.R. Data augmentation techniques in natural language processing. Appl. Soft Comput. 2023, 132, 109803. [Google Scholar] [CrossRef]
  62. Chang, Y.; Zhang, R.; Pu, J. I-WAS: A Data Augmentation Method with GPT-2 for Simile Detection. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer Science and Business Media Deutschland GmbH: Berlin/Heidelberg, Germany, 2023; pp. 265–279. [Google Scholar] [CrossRef]
  63. Chen, H.; Zhang, W.; Cheng, L.; Ye, H. Diverse and High-Quality Data Augmentation Using GPT for Named Entity Recognition. In Communications in Computer and Information Science; Springer Science and Business Media Deutschland GmbH: Berlin/Heidelberg, Germany, 2023; pp. 272–283. [Google Scholar] [CrossRef]
  64. Nakamoto, R.; Flanagan, B.; Yamauchi, T.; Dai, Y.; Takami, K.; Ogata, H. Enhancing Automated Scoring of Math Self-Explanation Quality Using LLM-Generated Datasets: A Semi-Supervised Approach. Computers 2023, 12, 217. [Google Scholar] [CrossRef]
  65. Jansen, B.J.; Jung, S.-G.; Salminen, J. Employing large language models in survey research. Nat. Lang. Process. J. 2023, 4, 100020. [Google Scholar] [CrossRef]
  66. Joon, J.; Chung, Y.; Kamar, E.; Amershi, S. Increasing Diversity While Maintaining Accuracy: Text Data Generation with Large Language Models and Human Interventions. arXiv 2023, arXiv:2306.04140. [Google Scholar]
  67. Borisov, V.; Leemann, T.; Seßler, K.; Haug, J.; Pawelczyk, M.; Kasneci, G. Deep Neural Networks and Tabular Data: A Survey. IEEE Trans. Neural Networks Learn. Syst. 2022, 1–21. [Google Scholar] [CrossRef] [PubMed]
  68. Acharya, A.; Singh, B.; Onoe, N. LLM Based Generation of Item-Description for Recommendation System. In Proceedings of the 17th ACM Conference on Recommender Systems, RecSys 2023, Singapore, 18–22 September 2023; Association for Computing Machinery, Inc.: New York, NY, USA, 2023; pp. 1204–1207. [Google Scholar] [CrossRef]
  69. Narayan, A.; Chami, I.; Orr, L.; Ré, C. Can Foundation Models Wrangle Your Data? Proc. Vldb Endow. 2022, 16, 738–746. [Google Scholar] [CrossRef]
  70. Borisov, V.; Seßler, K.; Leemann, T.; Pawelczyk, M.; Kasneci, G. Language Models are Realistic Tabular Data Generators. arXiv 2022, arXiv:2210.06280. [Google Scholar]
  71. Lee, M. A Mathematical Interpretation of Autoregressive Generative Pre-Trained Transformer and Self-Supervised Learning. Mathematics 2023, 11, 2451. [Google Scholar] [CrossRef]
  72. Alahmar, A.; Mohammed, E.; Benlamri, R. Application of data mining techniques to predict the length of stay of hospitalized patients with diabetes. In Proceedings of the 2018 International Conference on Big Data Innovations and Applications, Barcelona, Spain, 6–8 August 2018; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2018; pp. 38–43. [Google Scholar] [CrossRef]
  73. Sufi, F.K. AI-GlobalEvents: A Software for analyzing, identifying and explaining global events with Artificial Intelligence. Softw. Impacts 2022, 11, 100218. [Google Scholar] [CrossRef]
Figure 1. Conceptual diagram of GPT-based training data generation, feature extraction, and labelling.
Figure 1. Conceptual diagram of GPT-based training data generation, feature extraction, and labelling.
Information 15 00264 g001
Figure 2. Six distinct areas of research for “GPT in Medical Domain”.
Figure 2. Six distinct areas of research for “GPT in Medical Domain”.
Information 15 00264 g002
Figure 3. Traditional approach of manual interaction with Chat GPT web interface vs. fully automated interaction via GPT API.
Figure 3. Traditional approach of manual interaction with Chat GPT web interface vs. fully automated interaction via GPT API.
Information 15 00264 g003
Figure 4. Microsoft Power Automate invoking API calls to GPT API in an automated manner using HTTP requests.
Figure 4. Microsoft Power Automate invoking API calls to GPT API in an automated manner using HTTP requests.
Information 15 00264 g004
Figure 5. The process of passing specially designed prompts through Microsoft Power Automate (HTTP post method). * preceding Method and URI denotes mandatory fields. (a) Generating 70 patient discharge messages. (b) Labelling each of the 70 messages with severity and chances of hospital readmission.
Figure 5. The process of passing specially designed prompts through Microsoft Power Automate (HTTP post method). * preceding Method and URI denotes mandatory fields. (a) Generating 70 patient discharge messages. (b) Labelling each of the 70 messages with severity and chances of hospital readmission.
Information 15 00264 g005
Figure 6. Synthetic patient discharge summary generated for Alex Johnson using GPT prompt.
Figure 6. Synthetic patient discharge summary generated for Alex Johnson using GPT prompt.
Information 15 00264 g006
Figure 7. Synthetic patient discharge summary generated for Sophia Martinez using GPT prompt.
Figure 7. Synthetic patient discharge summary generated for Sophia Martinez using GPT prompt.
Information 15 00264 g007
Figure 8. Synthetic patient discharge summary generated for Emily Thompson using GPT prompt.
Figure 8. Synthetic patient discharge summary generated for Emily Thompson using GPT prompt.
Information 15 00264 g008
Figure 9. Synthetic patient discharge summary generated for Michael Roberts using GPT prompt.
Figure 9. Synthetic patient discharge summary generated for Michael Roberts using GPT prompt.
Information 15 00264 g009
Figure 10. Feature extraction process using GPT for labelling the discharge messages.
Figure 10. Feature extraction process using GPT for labelling the discharge messages.
Information 15 00264 g010
Figure 11. Results of labeling patient discharge summaries with GPT.
Figure 11. Results of labeling patient discharge summaries with GPT.
Information 15 00264 g011
Figure 12. GPT-based patient discharge summary viewed and analyzed with machine learning algorithms in Samsung Galaxy S23 Ultra.
Figure 12. GPT-based patient discharge summary viewed and analyzed with machine learning algorithms in Samsung Galaxy S23 Ultra.
Information 15 00264 g012
Table 1. Categorization of existing studies on the use of GPT in medical domain (X denotes “Topic of Interest”).
Table 1. Categorization of existing studies on the use of GPT in medical domain (X denotes “Topic of Interest”).
ReferenceLiterature Review and Meta-AnalysisData Generation, Augmentation, and LabelingData AnalysisMedical Question Answering and Decision Support SystemsDrug Discovery and Clinical Trial AnalysisEthical and Public Health Implications of AI in Medicine
[11]X XXXX
[12]X
[13]X
[14] X
[15] X
[16] X
[3] X
[17] X
[18]X X X
[19] X
[4] X
[5] X
[6] X
[7] X
[8] X
[9] X
[20] X
Table 2. Seventy synthetically generated patient discharge summaries with 20 fields each.
Table 2. Seventy synthetically generated patient discharge summaries with 20 fields each.
TerminologiesData GenerationData Analysis (Labeling)Data TypeDistinct ValuesUnique ValuesExample
Patient NameX String7070John Doe
AgeX Number462734
GenderX Binary20Male
Date of AdmissionX Date62555 January 2021
Date of DischargeX Date61542 February 2021
Admitting PhysicianX String5644Dr. Smith
Discharging PhysicianX String5949Dr. Williams
Reason for AdmissionX String6053Acute appendicitis
Treatment and Surgical ProceduresX String6459Appendectomy
Patient’s Response to TreatmentX String6459Patient responded well to surgical intervention
Medical HistoryX String5145No significant past medical history
Hospital CourseX String6968Patient underwent successful appendectomy, recovered without complications
Follow-upX String5139To review in outpatient clinic after 1 week
Patient InstructionsX String6766Light diet, rest and wound care
Final DiagnosisX String6560Final diagnosis of acute appendicitis
Discharge ConditionX String128Stable at discharge
Discharge MedicationsX String6968Prescribed antibiotics, painkillers, and laxatives
Severity Level XString30Moderate
Probability of Hospital Re-admission XString30Low
Reasoning XString6968Severity based on condition ‘Acute appendicitis’. Readmission probability based on discharge condition ‘Stable at discharge’ and medical history.
Table 3. Statistics on Age field.
Table 3. Statistics on Age field.
IndexAge
count70
mean56.57142857
std17.27322753
min23
25%45
50%58
75%70.75
max89
Table 4. Evaluation of the augmented data by GPT.
Table 4. Evaluation of the augmented data by GPT.
TPTNFPFNPrecisionRecallF1-Score
Severity624310.9538460.9841270.96875
Chances of Hospital Readmission595420.9365080.9672130.951613
Reasoning633220.9692310.9692310.969231
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sufi, F. Addressing Data Scarcity in the Medical Domain: A GPT-Based Approach for Synthetic Data Generation and Feature Extraction. Information 2024, 15, 264. https://doi.org/10.3390/info15050264

AMA Style

Sufi F. Addressing Data Scarcity in the Medical Domain: A GPT-Based Approach for Synthetic Data Generation and Feature Extraction. Information. 2024; 15(5):264. https://doi.org/10.3390/info15050264

Chicago/Turabian Style

Sufi, Fahim. 2024. "Addressing Data Scarcity in the Medical Domain: A GPT-Based Approach for Synthetic Data Generation and Feature Extraction" Information 15, no. 5: 264. https://doi.org/10.3390/info15050264

APA Style

Sufi, F. (2024). Addressing Data Scarcity in the Medical Domain: A GPT-Based Approach for Synthetic Data Generation and Feature Extraction. Information, 15(5), 264. https://doi.org/10.3390/info15050264

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop