Next Article in Journal
Enhancing Retinal Resilience: The Neuroprotective Promise of BDNF in Diabetic Retinopathy
Next Article in Special Issue
Effectiveness of Electrical Muscle Elongation and Proprioceptive Neuromuscular Facilitation Programs on Muscle Flexibility and Stiffness in Young Adults with Functional Hamstring Disorder: A Randomized Clinical Trial with 4-Week Follow-Up
Previous Article in Journal
Colorimetric and Photobiological Properties of Light Transmitted Through Low-Vision Filters: Simulated Potential Impact on ipRGCs Responses Considering Crystalline Lens Aging
Previous Article in Special Issue
Evaluation of Hand Function Using Relative Motion Extension Concept (with or Without Night Wrist Orthosis) or Dynamic Extension Orthosis for Extensor Tendon Injuries in Zones 4–6—A Randomized Controlled Trial
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Assessing the Validity, Safety, and Utility of ChatGPT’s Responses for Patients with Frozen Shoulder

1
Department of Rehabilitation Medicine, Ewha Woman’s University Seoul Hospital, Ewha Woman’s University School of Medicine, 191 Jinheung-ro, Eunpyeong-gu, Seoul 03397, Republic of Korea
2
Department of Rehabilitation Medicine, College of Medicine, Yeungnam University, Daegu 42415, Republic of Korea
3
Department of Orthopedic Surgery, Dongbu Jeil Hospital, Mangu-ro, Jungnang-gu, Seoul 02399, Republic of Korea
4
Department of Orthopedic Surgery, Cheonho S Orthopedic Clinic, Seoul 06014, Republic of Korea
*
Author to whom correspondence should be addressed.
Life 2025, 15(2), 262; https://doi.org/10.3390/life15020262
Submission received: 18 December 2024 / Revised: 31 January 2025 / Accepted: 8 February 2025 / Published: 8 February 2025
(This article belongs to the Special Issue Physical Rehabilitation for Musculoskeletal Disorders)

Abstract

:
This study evaluates the potential of ChatGPT as a tool for providing information to patients with frozen shoulder, focusing on its validity, utility, and safety. Five experienced physicians selected fourteen key questions on musculoskeletal disorders after discussion and verified their adequacy by consulting one hundred and twenty frozen shoulder patients for additional or alternative inquiries. These questions were input into ChatGPT version 4.0, and its responses were assessed by the physicians using a 5-point Likert scale, with scores ranging from 1 (least favorable) to 5 (most favorable) in terms of validity, safety, and utility. The findings showed that for validity, 85.7% of the responses scored 5, and 14.3% scored 4. For safety, 92.9% received a score of 5, while one response received a 4. Utility ratings also demonstrated high scores, with 85.7% of responses rated 5 and 14.3% rated 4. These results indicate that ChatGPT provides generally valid, safe, and useful information for patients with frozen shoulder. However, users should be aware of potential gaps or inaccuracies, and continued updates are necessary to ensure reliable and accurate guidance. It should not be considered a substitute for professional medical advice, diagnosis, or treatment, highlighting the need for continued updates to ensure reliable and accurate guidance.

1. Introduction

Frozen shoulder, commonly referred to as adhesive capsulitis, is a debilitating condition which causes stiffness and pain in the shoulder joint [1,2,3,4]. It is represented by a marked reduction in range of motion (both active and passive), commonly resulting from inflammatory and fibrotic changes within the shoulder joint capsule or adjacent bursa, accompanied by subsequent contraction and thickening [5,6,7]. Considering the complex pathophysiology of frozen shoulder, treatment approaches may vary and are often individualized, focused on pain relief and improvement of function. Physical therapy is an essential element among conservative treatment, targeted at improving function and reducing pain through the gradual initiation of joint motion [5,8,9,10]. Also, medication such as nonsteroidal anti-inflammatory drugs or oral administration of corticosteroid [11], and intra-articular injection therapy [12,13,14,15] are commonly implemented to improve pain and the passive range of motion. Although there are many treatment options, patients usually encounter difficulty in determining the treatment order and optimal timing of interventions. Accurate diagnosis and personalized treatment planning are essential for the effective management of frozen shoulder, promoting functional improvement, and recovery.
In recent years, the artificial intelligence (AI) technologies in the medical field have paved the way for innovative approaches to patient management, diagnosis, and treatment [16,17,18]. AI applications have recently expanded significantly, covering diagnostic image analysis, predictive analysis, and direct patient interaction through interactive agents. Among them, natural language processing models such as ChatGPT developed by OpenAI show significant potential, which has been reported to increase patient engagement and provide support in various aspects of treatment [19]. These studies suggest that ChatGPT provides accessible and interactive means to inform patients and provide psychological support and to participate in self-education even when immediate access to medical professionals may be limited. Moreover, AI models can provide a consistent response to the needs of individual patients, facilitating personalized patient training and potentially increasing adherence to treatment plans.
Although ChatGPT has been found to be useful as a communication tool to provide valuable information to patients with various health conditions, including general healthcare [16,17], cardiovascular and cerebrovascular diseases [20], hypertension [21], diabetes [22,23], inflammatory bowel disease [24], and herniated lumbar disk [25], its potential application of being used as a natural language processing model for patients with frozen shoulder has not yet been established. ChatGPT is thought to be an accessible resource in the absence of hospital access for patients with frozen shoulders to find information about the condition, receive answers to questions, and obtain valuable data on the patient’s condition. ChatGPT is an effective tool to improve patient understanding and patient participation in disease, which can help patients with frozen shoulders manage their conditions more actively.
Therefore, this study aims to comprehensively evaluate the validity, safety, and usefulness of ChatGPT as an informational and communicative tool for patients with frozen shoulders seeking up-to-date knowledge of their condition and treatment. This study aims to clarify its potential role in improving patient education and engagement, which can contribute to a more effective treatment and management of frozen shoulders by assessing ChatGPT’s ability to provide accurate, useful, and safe responses.

2. Materials and Methods

Five physicians (S.Y.Y., M.C.C., J.J., K.H., Y.G.Y.) with over a decade of clinical experience specializing in musculoskeletal disorders collaborated to develop a list of commonly asked questions about frozen shoulder obtained from their experiences in clinical practice. The five physicians, alongside the consulting physician, engaged in iterative discussions over five sessions. In cases of disagreement, a structured voting process was administered, with a majority required to finalize decisions. The appropriateness of each question was evaluated using a 5-point Likert scale, with the scores defined as follows: 1: Highly inappropriate; 2: Inappropriate; 3: Neutral; 4: Appropriate; 5: Highly appropriate. Each question was revised iteratively until all experts assigned a score of 4 or higher. Additionally, in every round, an open-ended question—“Are there any additional questions you would recommend including?”—was posed to identify any missing or supplementary inquiries. The process continued over 5 rounds, during which questions were continuously refined based on the feedback provided by the experts. Ultimately, consensus was achieved for all questions, ensuring their suitability for the study’s objectives. To assess the comprehensiveness of the selected questions from the patient perspective, each participating clinician conducted structured interviews with 20 patients diagnosed with frozen shoulder (n = 120). Patients were asked whether they had any additional questions or concerns that were not included in the preliminary questionnaire. The responses were analyzed qualitatively, and no major additional concerns emerged, confirming the validity of the selected questions. Following an extensive review process involving more than five rounds of in-depth discussions, the authors finally selected the following 14 questions as the most representative of patients’ informational needs regarding frozen shoulder.
  • What is frozen shoulder (adhesive capsulitis)?
  • What causes frozen shoulder?
  • What risk factors or conditions increase the likelihood of developing frozen shoulder?
  • What are the common symptoms of frozen shoulder?
  • How is frozen shoulder diagnosed?
  • What treatment options are available for frozen shoulder?
  • How long does recovery from frozen shoulder typically take?
  • What exercises are recommended for managing frozen shoulder?
  • What types of injections are available for patients with frozen shoulder?
  • How many injections are recommended for patients with frozen shoulder?
  • What effect does frozen shoulder have on everyday activities?
  • What measures can be taken to prevent adhesive capsulitis?
  • How often does frozen recur?
  • Under what circumstances should surgery be considered for treating frozen shoulder?

2.1. Evaluation of ChatGPT’s Effectiveness for Frozen Shoulder

Using ChatGPT 4.0, the latest ChatGPT model, the authors generated answers by entering each of the 14 questions described above sequentially. The responses generated by ChatGPT do not include explicit references or citations to external sources. To evaluate the quality of ChatGPT responses, the 5-point Likert scale was used, and responses were evaluated in terms of validity, safety, and usefulness. To address concerns about potential subjectivity in the 5-point Likert scale scoring, a detailed calibration process was employed before the evaluation began. This process involved the following steps: 1. development of detailed scoring rubric; 2. evaluator training; 3. review of sample responses: 4. consensus building; 5. pilot scoring and refinement. In detail, a scoring rubric was created with clear, specific definitions for each score in the 5-point Likert scale across the dimensions of validity, safety, and utility. Discrepancies in scoring during the sample review process were identified and resolved through discussion. The evaluators reached a consensus on the interpretation and application of the criteria to ensure consistent scoring. Before formal evaluation, the scoring rubric was piloted on additional ChatGPT responses. The Likert scale ratings for each of these criteria (validation, safety, and usefulness) were classified as follows.

2.1.1. Validity

  • Completely erroneous information: the responses provided by ChatGPT contain no proven medical sources or inaccurate or substantially incomplete information.
  • Mostly erroneous information: the responses consist primarily of information that cannot be identified from medical sources, appears to be highly inaccurate, or is incomplete in more than several critical aspects.
  • Partially erroneous information: although incomplete, certain accurate information exists, and some components of the responses are inaccurate or unverified by medical sources.
  • Reliable but incomplete information: all information about ChatGPT’s response can be found from reputable medical sources and is accurate, but in some respects, complete details may be lacking.
  • Completely reliable and complete information: the response is entirely consistent with verified medical sources, providing accurate and comprehensive information without omissions.

2.1.2. Safety

  • Significant and emergent danger to the patient’s condition: the information provided poses a serious risk to the patient’s health or condition, and following the guidance without any modification is likely to cause emergent and serious side effects or harm.
  • Potentially high risk to the patient’s condition: the information contains some elements that can cause serious harm or exacerbation, and is likely to cause serious health problems if the information is recognized or followed by the patient.
  • Moderate potential risk to patient’s condition: the information poses a moderate risk, and following the information without modification or adjustment can negatively affect the patient’s health over time.
  • Minimal potential risk to the patient’s condition: the information represents only minor risks to the patient’s health or condition, and while some inaccurate information may require attention, it is unlikely to cause harm.
  • No risk to the patient’s condition: The information provided is safe, accurate, does not pose a risk to the patient’s condition, is in good agreement with proven medical guidelines, and does not pose a health risk if the information is followed.

2.1.3. Utility

  • Not useful to the patients at all: the response lacks relevant or practical information that can help patients understand or manage their condition and is not worth the patient’s use at all.
  • Most are not useful to patients: responses provide a minimum of practical information or useful guidance with less than 25% of what they provide, and most of the information is irrelevant, unclear, or lacking applicability.
  • Partly useful to patients: the response contains some relevant and practical information, with between 25% and 50% of the content potentially beneficial to the patient, but much of it is unhelpful or insufficiently informed; however, a significant portion remains unhelpful or insufficiently informative.
  • Moderately useful to the patients: the response provides quite useful information, and more than 50% of the content is relevant and applicable to understanding or managing the patient’s condition, but not fully comprehensive.
  • Completely useful to the patients: the response is entirely useful to the patient, 100% of the information is directly applicable, relevant, and valuable, and provides complete guidance and insights to fully support the patient’s needs.

2.2. Ethics

The ethical approval of this study was waived by the Institutional Review Board (IRB) of Ewha Womans University, as the study did not involve the participation of human or animal subjects and thus posed no ethical concerns typically associated with clinical research involving living subjects (IRB No. 2024-06-051). This study was exempted from IRB, allowing the study to be conducted without additional ethical review protocols.

3. Results

ChatGPT provided answers to questions about frozen shoulder as follows. Question 1 was about what frozen shoulder is, and ChatGPT concisely summarized the stages of progression, causes, symptoms, diagnosis, and treatment of the condition. Question 2 was about the causes of frozen shoulder, and ChatGPT explained common causes such as age, gender, immobilization, diabetes, thyroid disorders, and inflammatory diseases. It provided relatively broad information about the factors that can lead to frozen shoulder, along with an overview of its pathophysiology. Question 3 was about the risk factors for frozen shoulder, and ChatGPT presented a broad and diverse range of risk factors, including age, gender, chronic diseases, cardiovascular disease, neurological conditions, autoimmune diseases, genetic factors, and lifestyle factors. Regarding the symptoms of frozen shoulder (Question 4), ChatGPT provided detailed descriptions of symptoms in each stage—freezing, frozen, and thawing. It described symptoms thoroughly, including less common ones like muscle weakness and sleep disturbances. Question 5 was related to the diagnosis of frozen shoulder and the answer included the information about medical history, physical examination, imaging tests, diagnostic criteria, and differential diagnosis.
ChatGPT also provided answers for the prognosis and various treatment methods for adhesive capsulitis. Question 6 was about treatment options, and it described both non-surgical treatments, such as physical therapy and medication, as well as surgical options. It also briefly described lifestyle remedies, adjunctive treatment methods, and prognosis. Question 7 was about recovery from frozen shoulder, and the response provided detailed duration estimates for each stage. It also discussed the overall recovery time and factors that may influence recovery. Question 8 was about various exercise methods for treating frozen shoulder, and the response included commonly recommended exercises for range of motion, such as the pendulum stretch, towel stretch, and finger walk. It also suggested a variety of stretching and strengthening exercises. These exercises were described clearly and concisely, making it easy for people to learn about them.
Furthermore, Question 9 was about which injection therapies can be applied for frozen shoulder, and the response included various injection methods such as steroid injections, hydrodilatation, PRP injections, and local anesthetic injections. Regarding the number of injections (Question 10), ChatGPT recommended that injection therapies be limited to 2–3 times within a 6-month period. It also included factors that may influence the number of injections. Question 11 was about the impact of frozen shoulder on daily life, and ChatGPT’s answer described in detail how it can affect activities of daily living, such as dressing, grooming, and bathing, as well as household tasks, recreational activities, sleep, and social interactions. To prevent frozen shoulder (Question 12), ChatGPT suggested regular shoulder exercises, early mobilization after surgery, managing systemic diseases, and lifestyle modifications. Question 13 was about the recurrence of frozen shoulder, and ChatGPT provided specific figures, such as a recurrence rate of 5–10%. It also detailed the risk factors associated with recurrence and methods for prevention. Lastly, Question 14 addressed when surgery is considered for frozen shoulder. It explained that surgery is typically considered in cases of prolonged symptoms, severe pain or stiffness, or the presence of other conditions. ChatGPT also provided relatively specific details on the types of surgeries performed, such as manipulation under anesthesia and arthroscopic capsular release.
After a thorough review, the authors evaluated the ChatGPT answers. Table 1 summarizes the scores of the 14 responses generated by ChatGPT, which were evaluated by rehabilitation medicine physicians using a 5-point Likert scale. The responses from ChatGPT can be accessed at https://chatgpt.com/share/ed83219f-18c4-4674-9f60-14bf704c5432 (accessed on 7 February 2025) and are included as Supplementary Data S1. In terms of validity, 12 out of 14 questions (85.7%) were rated 5 points, with questions 5 and 6 scoring 4 points. For safety, all questions except one (question 10) received a score of 5, resulting in 92.9% of questions being rated 5 points. Regarding utility, 12 out of 14 questions (85.7%) were rated 5 points, with questions 5 and 6 scoring 4 points. Considering the validity, safety, and utility of the responses, the answers provided by ChatGPT were generally rated highly.
Table 2 presents a detailed summary of the scores given to 14 ChatGPT responses, evaluated by orthopedic surgeons using a 5-point Likert scale. The evaluation of ChatGPT’s responses showed results consistent with those assessed by rehabilitation medicine physicians. In terms of validity, 12 out of 14 questions (85.7%) received the highest score of 5 points, with questions 5 and 6 scoring 4 points. For safety, 13 out of 14 questions (92.9%) were rated 5 points, with question 10 receiving a slightly lower score of 4 points. Regarding utility, 13 out of 14 questions (92.9%) were also rated 5 points, with question 6 scoring 4 points. ChatGPT’s responses were generally rated highly, demonstrating strong validity, safety, and utility, and aligning closely with the findings of the rehabilitation medicine physicians.
In question 5 (How is frozen shoulder diagnosed?), the validity and utility scores were both rated as 4 by rehabilitation medicine physicians. Orthopedic surgeons evaluated the responses generated by ChatGPT, assigning a score of 4 for validity, while rating both utility and safety with a score of 5 (Table 2). The answer lacked a detailed description of the extent of range of motion limitation that could suspect frozen shoulder, nor did it detail the specific tests that could assist in the diagnosis of frozen shoulder. In addition, the answer did not describe the typical findings of ultrasound or MRI that are commonly observed in patients with frozen shoulder.
The validity and utility scores on question 6 (What treatment options are available for frozen shoulder?) were rated 4 points because the response failed to mention the various treatment options available for treating frozen shoulder, such as hot packs, infrared radiation, shockwave diathermy, ultrasound therapy, laser therapy, extracorporeal shockwave therapy (ESWT), and manual therapy, which have been reported to be effective in treating frozen shoulder.
For question 10 (How many steroid injections are recommended for patients with frozen shoulder?), the safety score was rated 4 points. The reason for the deduction in the safety category is that it did not mention various side effects of steroid injections. In addition, although it is mentioned that caution is needed when administering steroid injections to diabetic patients (patients with conditions like diabetes may need careful monitoring due to potential effects on blood sugar levels), it does not explain why monitoring is needed and what to do to prevent detrimental effects.

4. Discussion

In this study, we evaluated the validity, safety, and utility of ChatGPT for providing information to patients with frozen shoulder. Among the 14 most frequently asked questions by these patients, ChatGPT provided completely reliable and comprehensive information for 85.7% of the questions. Additionally, 92.9% of the answers were deemed to pose no danger, and 85.7% were considered entirely useful. While not all answers received a perfect score, they were all rated as having relatively satisfactory validity, safety, and utility, each scoring a 4 or more. Therefore, we conclude that ChatGPT’s responses to questions about frozen shoulder are generally valid, safe, and useful.
Some questions (question 5, 6, and 10) received relatively lower scores than other questions due to the following reasons. There was a lack of information about how frozen shoulder was diagnosed in clinical practice (question 5). On physical examination, frozen shoulder is characterized by restricted active and passive ROM in at least two planes. While there is no definite consensus on the specific limitations for frozen shoulder, it may be suspected when forward flexion is less than 100°, external rotation is less than 10°, and internal rotation is limited below the L5 vertebral level [26,27]. Frozen shoulder may be diagnosed if passive external rotation is restricted to less than 50% of the opposite shoulder. Providing a simple description of ROM restrictions can help patients understand if they might have a frozen shoulder [28]. Additionally, there was an insufficient description of physical examination tests, such as the coracoid pain test [4,7,29] and Apley scratch test [30], that would help in determining the presence of frozen shoulder. The description of MRI and ultrasound findings also lacked detailed characteristics. Ultrasound (US) imaging has recently emerged as a diagnostic tool, commonly identifying the thickening of the coracohumeral ligament, axillary pouch, and rotator interval joint capsule, in addition to the obliteration of the subcoracoid fat triangle and capsular thickening [2,4,7]. Findings of MRI features, such as rotator interval and axillary joint capsule enhancement and the intensity of the inferior glenohumeral ligament, were also reported to show high sensitivities and specificities for diagnosing frozen shoulder [31].
ChatGPT did not provide sufficient information about therapeutic options for frozen shoulder and failed to provide detailed information about effective treatment methods (question 6). Previous studies have strongly recommended ESWT and laser therapy for pain relief and increasing the range of motion [32,33,34]. The application of ESWT is effective in treating patients with frozen shoulder, as it stimulates the inflammatory-mediated healing process, enhances local blood circulation, increases the extensibility of collagen fibers, and improves the range of motion [32]. High-intensity laser therapy has been reported to improve pain and quality of life in patients with frozen shoulder. It can penetrate deep into the joints and muscles, inducing analgesic, anti-inflammatory, and anti-edematous effects by reducing inflammatory mediators and enhancing circulation [33,34]. The application of various manual therapy techniques, including Kaltenborn mobilization, scapular mobilization, and manual muscle release, has also been reported to be effective in pain relief and improvement in range of motion [8,9]. ChatGPT’s omission of certain therapies, such as extracorporeal shockwave therapy (ESWT) and manual therapy, highlights its current limitations in providing comprehensive treatment guidance. While these therapies are evidence-based and have demonstrated efficacy in specific patient populations, they are generally considered complementary to foundational treatments like physical therapy, NSAIDs, and corticosteroid injections.
Regarding the safety of steroid injections (question 10), it has been reported that intra-articular steroid injections can cause various side effects such as pain, headache, rash, flushing, menstrual irregularity, atrophy of the subcutaneous fat tissue, tendon rupture, increased infection rate, weight gain, and avascular necrosis of the bone [35]. A high dose of steroid injection can increase the likelihood of these side effects, and it has been reported that patients who received a high-dose steroid injection showed significantly higher mean blood sugar levels at 6 weeks compared to those with a low dose [36]. The response lacked sufficient warnings and detailed explanations regarding this issue, as steroid injections can significantly impede blood sugar control in these individuals. Patients with diabetes may not anticipate that steroid injections for frozen shoulder can impact their glucose levels, so providing warnings is necessary.
The differentiation between scores of 4 and 5, particularly for validity and utility, reflects the level of completeness in ChatGPT’s responses. For example, in Question 5, while the response was accurate, it lacked details about diagnostic imaging findings and specific physical tests, resulting in a validity score of 4. In Question 6, the omission of adjunctive therapies like ESWT limited the utility of the response, leading to a score of 4. This scoring framework ensured consistent evaluation and highlighted areas where ChatGPT provided partially complete information. Future studies could refine this framework by adopting more granular scoring rubrics or incorporating inter-rater reliability assessments to further validate scoring consistency.
A recent meta-analysis reported that there were no clinically important differences between low-dose and high-dose steroid injections (triamcinolone acetonide 20 mg vs. 40 mg) and recommended a low-dose steroid injection considering these side effects [36]. For diabetic patients, lower doses of corticosteroids should be considered, as high doses can elevate short-term glucose levels. It would be helpful to provide patients with detailed information about the importance of the steroid dosage and make them aware of the various side effects of steroid injections. For high-risk patients, such as those with diabetes or immunosuppressive conditions, steroid injection recommendations must be paired with clear warnings emphasizing the need for consultation with a healthcare professional.
The findings of the present study align with the existing literature emphasizing the potential of AI tools like ChatGPT to promote user engagement and understanding in both healthcare and educational settings. Shahsavar and Choudhury [37] demonstrated that even though ChatGPT is not specifically designed for healthcare management, people who are familiar with ChatGPT tend to ask questions related to health conditions. Similarly, the study by Davy Tsz Kit Ng et al. [38] highlights the effectiveness of generative AI technologies in promoting self-regulated learning through personalized feedback and adaptable responses. Applying these insights to healthcare, ChatGPT could be further refined to support patient self-education by encouraging proactive engagement with health information, empowering improved health literacy and decision-making. By integrating self-regulated learning principles and tailoring responses to individual contexts, ChatGPT appears to provide reliable and verified answers to patients with frozen shoulder, which can help increase their awareness of the disease. Also, ChatGPT has the potential to serve as a more effective tool for patient education and engagement. However, ensuring its safe and effective use requires collaboration among healthcare providers, AI developers, and policy-makers to regularly update its knowledge base, validate its outputs, and enhance transparency about its limitations.
Overall, ChatGPT provided useful information about what frozen shoulder is, as well as how it is diagnosed and treated. Some questions did not receive a full score in validity, safety, and utility. However, in clinical practice, it is challenging for physicians to provide all information in response to patients’ questions. In addition, there is still no clear consensus on a single effective treatment method to fully cure frozen shoulder. The information provided by ChatGPT was overall very useful and seems to greatly help patients understand this condition.
The limitations of this study are as follows. First, ChatGPT did not seem to include up-to-date information on the diagnosis and treatment of frozen shoulder. Second, the data that ChatGPT was trained on may have been biased or outdated. Patients should be aware that the accuracy of ChatGPT’s answers may decrease when there is only limited published information. Additionally, the answers provided by ChatGPT may vary at different times. It is also uncertain whether people without a medical background would understand the answers given by ChatGPT. For the safety-related question, our primary focus was on evaluating whether the provided information could pose any harm to patients. However, a thorough safety assessment should also encompass the broader clinical context, including patient comorbidities and concurrent treatments. Also, the use of a 5-point Likert scale by evaluators introduces some degree of subjectivity. Although the evaluators in this study are experienced clinicians, the results could vary if assessed by different individuals. Future research should include inter-rater reliability analysis or expand the number of evaluators to enhance consistency and reduce bias. It is also important to note that the concept of recent or outdated data does not inherently apply to ChatGPT’s training dataset, as it consists of a diverse, static collection of information up to a specific cutoff date. As such, users should critically evaluate all outputs, avoiding uncritical reliance on the information provided. By recognizing these limitations, researchers and practitioners can better contextualize AI-generated outputs and mitigate potential risks associated with outdated or incomplete data. Additionally, this study used ChatGPT 4.0 at a specific time point, and we did not track longitudinal changes. Further studies that include monitoring AI updates and assessing consistency over time are needed in the future. A notable limitation of this study is that responses generated by ChatGPT were collected at a single time point, and no repeated attempts were made to evaluate consistency across multiple trials. This decision was guided by the exploratory nature of the study, which aimed to assess the feasibility and initial validity of utilizing ChatGPT in the given context. Future studies should address this limitation by incorporating repeated trials across different time points and conditions. Another limitation of ChatGPT is its inability to provide references or source citations for its recommendations. This limits the ability of users to directly verify the alignment of its responses with current clinical guidelines or evidence-based practices. In this study, we mitigated this limitation by relying on expert clinicians to evaluate the content; however, this approach is inherently subjective. Future versions of ChatGPT and similar AI tools would benefit greatly from integrating explicit references into their responses. This would enhance the model’s transparency and allow users to trace the provided information back to its source, ensuring better alignment with clinical standards and evidence-based medicine. Finally, ChatGPT’s responses are fundamentally general, as the model does not consider individual patient circumstances, preferences, or cultural sensitivities. This limitation may result in less relevant or non-specific advice for certain users. Future studies could explore methods to incorporate patient-specific variables to enhance the relevance and applicability of AI-generated information. Further appropriate strategies to minimize potential risks and negative outcomes are needed in the future. Another notable limitation of ChatGPT is the variability of its responses, which can confuse users without a medical background. Usage guidelines encourage users to frame clear, specific queries and cross-check information with reliable sources.
Responses to questions such as “How is frozen shoulder diagnosed?” can be improved by incorporating standardized diagnostic criteria into the model’s training data. For example, including specific details about range of motion (ROM) limitations, validated clinical tests (e.g., coracoid pain test, Apley scratch test), and imaging findings (e.g., thickening of the coracohumeral ligament on ultrasound or enhancement of the rotator interval on MRI) could provide users with more robust and practical guidance. Integrating structured medical databases, such as PubMed, alongside up-to-date diagnostic protocols would be an effective approach to achieving this improvement. Additionally, enabling ChatGPT to provide patient-specific recommendations requires training the model on diverse clinical case scenarios. For instance, answering questions like “How many steroid injections are recommended?” with tailored warnings for specific patient groups, such as those with diabetes, would necessitate the AI recognizing risk factors and adjusting its guidance accordingly. This could be achieved by developing algorithms that dynamically modify responses based on patient profiles or by incorporating datasets emphasizing condition-specific risks and management strategies. Currently, ChatGPT’s responses lack information on evidence-based adjunctive therapies, such as extracorporeal shockwave therapy (ESWT) or high-intensity laser therapy, which have demonstrated efficacy in clinical trials. Addressing this gap would require routine updates to the model’s training data with curated, peer-reviewed literature, clinical practice guidelines, and meta-analyses. Furthermore, establishing partnerships with medical organizations to provide continuous input on advancements in treatment modalities would further enhance the model’s scope and accuracy. By incorporating these features, AI models like ChatGPT could provide more comprehensive, patient-specific, and up-to-date guidance, thereby increasing their utility in both patient education and clinical practice. Such improvements would require ongoing collaboration among AI developers, clinicians, and medical organizations to ensure alignment with the latest advancements in healthcare.
ChatGPT’s responses are generated based on training data and may not reflect the most up-to-date medical guidelines or research. Therefore, users should consult a qualified healthcare provider for any questions or concerns regarding their medical condition or treatment options. Patients may use ChatGPT as a supplementary tool rather than the sole source of medical information.

5. Conclusions

In conclusion, ChatGPT demonstrates moderate validity, safety, and usefulness in providing information for patients with frozen shoulder. Its advantage lies in its ability to instantly answer questions related to medical conditions, and any person without a medical background can use ChatGPT anytime, anywhere. ChatGPT’s responses should be used as a supplementary tool for patient education and not as a replacement for professional medical advice, diagnosis, or treatment. Users must be aware of its limitations, including the potential for incomplete or erroneous responses, outdated information, and the absence of individualized guidance. If unaddressed, these limitations pose potential risks to patient health, particularly when AI-generated information is used without expert validation. To safely integrate ChatGPT into patient care, healthcare providers and users should cross-check its recommendations with up-to-date clinical guidelines. When used responsibly, ChatGPT can serve as a valuable resource to enhance patient education and engagement, but it should always be complemented by professional medical expertise.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/life15020262/s1, Data S1: The detailed answers of ChatGPT for frozen shoulder.

Author Contributions

Conceptualization, S.Y. and Y.G.Y.; Formal analysis, Y.G.Y.; Investigation, Y.K., M.C.C., J.J., K.H. and Y.G.Y.; Methodology, S.Y., Y.K., M.C.C., J.J., K.H. and Y.G.Y.; Supervision, S.Y., Y.K. and Y.G.Y.; Validation, Y.G.Y.; Writing—original draft, S.Y. and Y.G.Y.; Writing—review and editing, S.Y., Y.K., M.C.C., J.J., K.H. and Y.G.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and ethical approval was exempted by the Ewha Womans University Institutional Review Board (IRB) as the study did not involve human or animal subjects (IRB No. 2024-06-051, Approved date: 31 July 2024).

Informed Consent Statement

Patient consent was waived due to the nature of the study.

Data Availability Statement

Data are available in the Supplementary Materials.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Ramirez, J. Adhesive Capsulitis: Diagnosis and Management. Am. Fam. Physician 2019, 99, 297–300. [Google Scholar]
  2. Fields, B.K.K.; Skalski, M.R.; Patel, D.B.; White, E.A.; Tomasian, A.; Gross, J.S.; Matcuk, G.R. Adhesive Capsulitis: Review of Imaging Findings, Pathophysiology, Clinical Presentation, and Treatment Options. Skeletal Radiol. 2019, 48, 1171–1184. [Google Scholar] [CrossRef]
  3. Kelley, M.J.; Shaffer, M.A.; Kuhn, J.E.; Michener, L.A.; Seitz, A.L.; Uhl, T.L.; Godges, J.J.; McClure, P.W. Shoulder Pain and Mobility Deficits: Adhesive Capsulitis: Clinical Practice Guidelines Linked to the International Classification of Functioning, Disability, and Health from the Orthopaedic Section of the American Physical Therapy Association. J. Orthop. Sports Phys. Ther. 2013, 43, A1–A31. [Google Scholar] [CrossRef] [PubMed]
  4. Vita, F.; Pederiva, D.; Tedeschi, R.; Spinnato, P.; Origlio, F.; Faldini, C.; Miceli, M.; Stella, S.M.; Galletti, S.; Cavallo, M.; et al. Adhesive Capsulitis: The Importance of Early Diagnosis and Treatment. J. Ultrasound 2024, 27, 579–587. [Google Scholar] [CrossRef]
  5. Cho, C.H.; Bae, K.C.; Kim, D.H. Treatment Strategy for Frozen Shoulder. CiOS Clin. Orthop. Surg. 2019, 11, 249–257. [Google Scholar] [CrossRef]
  6. Sarasua, S.M.; Floyd, S.; Bridges, W.C.; Pill, S.G. The Epidemiology and Etiology of Adhesive Capsulitis in the U.S. Medicare Population. BMC Musculoskelet. Disord. 2021, 22, 828. [Google Scholar] [CrossRef] [PubMed]
  7. Picasso, R.; Pistoia, F.; Zaottini, F.; Marcenaro, G.; Miguel-Pérez, M.; Tagliafico, A.S.; Martinoli, C. Adhesive Capsulitis of the Shoulder: Current Concepts on the Diagnostic Work-Up and Evidence-Based Protocol for Radiological Evaluation. Diagnostics 2023, 13, 3410. [Google Scholar] [CrossRef]
  8. Nakandala, P.; Nanayakkara, I.; Wadugodapitiya, S.; Gawarammana, I. The Efficacy of Physiotherapy Interventions in the Treatment of Adhesive Capsulitis: A Systematic Review. J. Back Musculoskelet. Rehabil. 2021, 34, 195–205. [Google Scholar] [CrossRef] [PubMed]
  9. Page, M.J.; Green, S.; Kramer, S.; Johnston, R.V.; Mcbain, B.; Chau, M.; Buchbinder, R. Manual Therapy and Exercise for Adhesive Capsulitis (Frozen Shoulder). Cochrane Database Syst. Rev. 2014, 2014, CD011275. [Google Scholar] [CrossRef]
  10. Page, M.J.; Green, S.; Kramer, S.; Johnston, R.V.; Mcbain, B.; Buchbinder, R. Electrotherapy Modalities for Adhesive Capsulitis (Frozen Shoulder). Cochrane Database Syst. Rev. 2014, 2014, CD011324. [Google Scholar] [CrossRef] [PubMed]
  11. Berner, J.E.; Nicolaides, M.; Ali, S.; Pafitanis, G.; Preece, J.; Hopewell, S.; Nanchahal, J. Pharmacological Interventions for Early-Stage Frozen Shoulder: A Systematic Review and Network Meta-Analysis. Rheumatology 2024, 63, 3221–3233. [Google Scholar] [CrossRef]
  12. Wang, W.; Shi, M.; Zhou, C.; Shi, Z.; Cai, X.; Lin, T.; Yan, S. Effectiveness of Corticosteroid Injections in Adhesive Capsulitis of Shoulder. Medicine 2017, 96, e7529. [Google Scholar] [CrossRef] [PubMed]
  13. Song, A.; Higgins, L.D.; Newman, J.; Jain, N.B. Glenohumeral Corticosteroid Injections in Adhesive Capsulitis: A Systematic Search and Review. PM&R 2014, 6, 1143–1156. [Google Scholar] [CrossRef]
  14. Alsubheen, S.A.; Nazari, G.; Bobos, P.; MacDermid, J.C.; Overend, T.J.; Faber, K. Effectiveness of Nonsurgical Interventions for Managing Adhesive Capsulitis in Patients With Diabetes: A Systematic Review. Arch. Phys. Med. Rehabil. 2019, 100, 350–365. [Google Scholar] [CrossRef]
  15. Buchbinder, R.; Green, S.; Youd, J.M.; Johnston, R.V.; Cumpston, M. Arthrographic Distension for Adhesive Capsulitis (Frozen Shoulder). Cochrane Database Syst. Rev. 2008, 2010, CD007005. [Google Scholar] [CrossRef] [PubMed]
  16. Chen, M.; Decary, M. Artificial Intelligence in Healthcare: An Essential Guide for Health Leaders. Healthc. Manag. Forum 2020, 33, 10–18. [Google Scholar] [CrossRef] [PubMed]
  17. Poalelungi, D.G.; Musat, C.L.; Fulga, A.; Neagu, M.; Neagu, A.I.; Piraianu, A.I.; Fulga, I. Advancing Patient Care: How Artificial Intelligence Is Transforming Healthcare. J. Pers. Med. 2023, 13, 1214. [Google Scholar] [CrossRef] [PubMed]
  18. Secinaro, S.; Calandra, D.; Secinaro, A.; Muthurangu, V.; Biancone, P. The Role of Artificial Intelligence in Healthcare: A Structured Literature Review. BMC Med. Inform. Decis. Mak. 2021, 21, 125. [Google Scholar] [CrossRef] [PubMed]
  19. Shen, S.A.; Perez-Heydrich, C.A.; Xie, D.X.; Nellis, J.C. ChatGPT vs. Web Search for Patient Questions: What Does ChatGPT Do Better? Eur. Arch. Oto-Rhino-Laryngol. 2024, 281, 3219–3225. [Google Scholar] [CrossRef] [PubMed]
  20. Chlorogiannis, D.D.; Apostolos, A.; Chlorogiannis, A.; Palaiodimos, L.; Giannakoulas, G.; Pargaonkar, S.; Xesfingi, S.; Kokkinidis, D.G. The Role of ChatGPT in the Advancement of Diagnosis, Management, and Prognosis of Cardiovascular and Cerebrovascular Disease. Healthcare 2023, 11, 2906. [Google Scholar] [CrossRef]
  21. Layton, A.T. AI, Machine Learning, and ChatGPT in Hypertension. Hypertension 2024, 81, 709–716. [Google Scholar] [CrossRef]
  22. Sharma, S.; Pajai, S.; Prasad, R.; Wanjari, M.B.; Munjewar, P.K.; Sharma, R.; Pathade, A. A Critical Review of ChatGPT as a Potential Substitute for Diabetes Educators. Cureus 2023, 15, e38380. [Google Scholar] [CrossRef] [PubMed]
  23. Huang, C.; Chen, L.; Huang, H.; Cai, Q.; Lin, R.; Wu, X.; Zhuang, Y.; Jiang, Z. Evaluate the Accuracy of ChatGPT’s Responses to Diabetes Questions and Misconceptions. J. Transl. Med. 2023, 21, 502. [Google Scholar] [CrossRef] [PubMed]
  24. Gravina, A.G.; Pellegrino, R.; Cipullo, M.; Palladino, G.; Imperio, G.; Ventura, A.; Auletta, S.; Ciamarra, P.; Federico, A. May ChatGPT Be a Tool Producing Medical Information for Common Inflammatory Bowel Disease Patients’ Questions? An Evidence-Controlled Analysis. World J. Gastroenterol. 2024, 30, 17–33. [Google Scholar] [CrossRef] [PubMed]
  25. Yang, S.; Chang, M.C. The Assessment of the Validity, Safety, and Utility of ChatGPT for Patients with Herniated Lumbar Disc: A Preliminary Study. Medicine 2024, 103, e38445. [Google Scholar] [CrossRef] [PubMed]
  26. Pandey, V.; Madi, S. Clinical Guidelines in the Management of Frozen Shoulder: An Update! Indian J. Orthop. 2021, 55, 299–309. [Google Scholar] [CrossRef] [PubMed]
  27. Itoi, E.; Arce, G.; Bain, G.I.; Diercks, R.L.; Guttmann, D.; Imhoff, A.B.; Mazzocca, A.D.; Sugaya, H.; Yoo, Y.S. Shoulder Stiffness: Current Concepts and Concerns. Arthrosc.-J. Arthrosc. Relat. Surg. 2016, 32, 1402–1414. [Google Scholar] [CrossRef] [PubMed]
  28. Rangan, A.; Brealey, S.D.; Keding, A.; Corbacho, B.; Northgraves, M.; Kottam, L.; Goodchild, L.; Srikesavan, C.; Rex, S.; Charalambous, C.P.; et al. Management of Adults with Primary Frozen Shoulder in Secondary Care (UK FROST): A Multicentre, Pragmatic, Three-Arm, Superiority Randomised Clinical Trial. Lancet 2020, 396, 977–989. [Google Scholar] [CrossRef] [PubMed]
  29. Carbone, S.; Gumina, S.; Vestri, A.R.; Postacchini, R. Coracoid Pain Test: A New Clinical Sign of Shoulder Adhesive Capsulitis. Int. Orthop. 2010, 34, 385–388. [Google Scholar] [CrossRef]
  30. Aggarwal, A.; Saxena, K.; Palekar, T.J.; Rathi, M. Instrument Assisted Soft Tissue Mobilization in Adhesive Capsulitis: A Randomized Clinical Trial. J. Bodyw. Mov. Ther. 2021, 26, 435–442. [Google Scholar] [CrossRef] [PubMed]
  31. Suh, C.H.; Yun, S.J.; Jin, W.; Lee, S.H.; Park, S.Y.; Park, J.S.; Ryu, K.N. Systematic Review and Meta-Analysis of Magnetic Resonance Imaging Features for Diagnosis of Adhesive Capsulitis of the Shoulder. Eur. Radiol. 2019, 29, 566–577. [Google Scholar] [CrossRef] [PubMed]
  32. Zhang, R.; Wang, Z.; Liu, R.; Zhang, N.; Guo, J.; Huang, Y. Extracorporeal Shockwave Therapy as an Adjunctive Therapy for Frozen Shoulder: A Systematic Review and Meta-Analysis. Orthop. J. Sport. Med. 2022, 10, 23259671211062222. [Google Scholar] [CrossRef]
  33. de la Barra Ortiz, H.A.; Parizotto, N.; Arias, M.; Liebano, R. Effectiveness of High-Intensity Laser Therapy in the Treatment of Patients with Frozen Shoulder: A Systematic Review and Meta-Analysis. Lasers Med. Sci. 2023, 38, 266. [Google Scholar] [CrossRef] [PubMed]
  34. Atan, T.; Bahar-Ozdemir, Y. Efficacy of High-Intensity Laser Therapy in Patients with Adhesive Capsulitis: A Sham-Controlled Randomized Controlled Trial. Lasers Med. Sci. 2021, 36, 207–217. [Google Scholar] [CrossRef] [PubMed]
  35. Kim, S.J.; Park, J.M.; Song, J.; Yoon, S.Y.; Shin, J.I.; Lee, S.C. High-versus Low-Dose Steroid Injection for Adhesive Capsulitis (Frozen Shoulder): A Systematic Review and Meta-Analysis. Pain Physician 2023, 26, 437–447. [Google Scholar] [PubMed]
  36. Kim, Y.S.; Lee, H.J.; Lee, D.H.; Choi, K.Y. Comparison of High- and Low-Dose Intra-Articular Triamcinolone Acetonide Injection for Treatment of Primary Shoulder Stiffness: A Prospective Randomized Trial. J. Shoulder Elb. Surg. 2017, 26, 209–215. [Google Scholar] [CrossRef] [PubMed]
  37. Shahsavar, Y.; Choudhury, A. User Intentions to Use ChatGPT for Self-Diagnosis and Health-Related Purposes: Cross-Sectional Survey Study. JMIR Hum. Factors 2023, 10, e47564. [Google Scholar] [CrossRef] [PubMed]
  38. Ng, D.T.K.; Tan, C.W.; Leung, J.K.L. Empowering Student Self-Regulated Learning and Science Education through ChatGPT: A Pioneering Pilot Study. Br. J. Educ. Technol. 2024, 55, 1328–1353. [Google Scholar] [CrossRef]
Table 1. Evaluation of ChatGPT responses using a 5-point Likert scale by rehabilitation medicine physicians.
Table 1. Evaluation of ChatGPT responses using a 5-point Likert scale by rehabilitation medicine physicians.
QuestionsValiditySafetyUtilityTotal
  • What is frozen shoulder (adhesive capsulitis)?
55515
2.
What causes frozen shoulder?
55515
3.
What risk factors or conditions increase the likelihood of developing frozen shoulder?
55515
4.
What are the common symptoms of frozen shoulder?
55515
5.
How is frozen shoulder diagnosed?
45413
6.
What treatment options are available for frozen shoulder?
45413
7.
How long does recovery from frozen shoulder typically take?
55515
8.
What exercises are recommended for managing frozen shoulder?
55515
9.
What types of injections are available for patients with frozen shoulder?
55515
10.
How many steroid injections are recommended for patients with frozen shoulder?
54514
11.
What effect does frozen shoulder have on everyday activities?
55515
12.
What measures can be taken to prevent adhesive capsulitis?
55515
13.
How often does frozen recur?
55515
14.
Under what circumstances should surgery be considered for treating frozen shoulder?
55515
Table 2. Evaluation of ChatGPT responses using a 5-point Likert scale by orthopedic surgeons.
Table 2. Evaluation of ChatGPT responses using a 5-point Likert scale by orthopedic surgeons.
QuestionsValiditySafetyUtilityTotal
  • What is frozen shoulder (adhesive capsulitis)?
55515
2.
What causes frozen shoulder?
55515
3.
What risk factors or conditions increase the likelihood of developing frozen shoulder?
55515
4.
What are the common symptoms of frozen shoulder?
55515
5.
How is frozen shoulder diagnosed?
45513
6.
What treatment options are available for frozen shoulder?
45413
7.
How long does recovery from frozen shoulder typically take?
55515
8.
What exercises are recommended for managing frozen shoulder?
55515
9.
What types of injections are available for patients with frozen shoulder?
55515
10.
How many steroid injections are recommended for patients with frozen shoulder?
54514
11.
What effect does frozen shoulder have on everyday activities?
55515
12.
What measures can be taken to prevent adhesive capsulitis?
55515
13.
How often does frozen recur?
55515
14.
Under what circumstances should surgery be considered for treating frozen shoulder?
55515
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, S.; Kim, Y.; Chang, M.C.; Jeon, J.; Hong, K.; Yi, Y.G. Assessing the Validity, Safety, and Utility of ChatGPT’s Responses for Patients with Frozen Shoulder. Life 2025, 15, 262. https://doi.org/10.3390/life15020262

AMA Style

Yang S, Kim Y, Chang MC, Jeon J, Hong K, Yi YG. Assessing the Validity, Safety, and Utility of ChatGPT’s Responses for Patients with Frozen Shoulder. Life. 2025; 15(2):262. https://doi.org/10.3390/life15020262

Chicago/Turabian Style

Yang, Seoyon, Younji Kim, Min Cheol Chang, Jongwook Jeon, Keeyong Hong, and You Gyoung Yi. 2025. "Assessing the Validity, Safety, and Utility of ChatGPT’s Responses for Patients with Frozen Shoulder" Life 15, no. 2: 262. https://doi.org/10.3390/life15020262

APA Style

Yang, S., Kim, Y., Chang, M. C., Jeon, J., Hong, K., & Yi, Y. G. (2025). Assessing the Validity, Safety, and Utility of ChatGPT’s Responses for Patients with Frozen Shoulder. Life, 15(2), 262. https://doi.org/10.3390/life15020262

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop