Next Article in Journal
The Correlation Between Infant Head Shape in Craniometric Studies and Psychomotor Development Disorders
Previous Article in Journal
Heart Failure in Older Patients: An Update
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Breaking Bones, Breaking Barriers: ChatGPT, DeepSeek, and Gemini in Hand Fracture Management

1
Plastic Surgery Unit, Department of Medicine, Surgery and Neuroscience, University of Siena, 53100 Siena, Italy
2
Department of Plastic and Reconstructive Surgery, Peninsula Health, Frankston, VIC 3199, Australia
*
Author to whom correspondence should be addressed.
J. Clin. Med. 2025, 14(6), 1983; https://doi.org/10.3390/jcm14061983
Submission received: 13 February 2025 / Revised: 1 March 2025 / Accepted: 13 March 2025 / Published: 14 March 2025

Abstract

:
Background: Hand fracture management requires precise diagnostic accuracy and complex decision-making. Advances in artificial intelligence (AI) suggest that large language models (LLMs) may assist or even rival traditional clinical approaches. This study evaluates the effectiveness of ChatGPT-4o, DeepSeek-V3, and Gemini 1.5 in diagnosing and recommending treatment strategies for hand fractures compared to experienced surgeons. Methods: A retrospective analysis of 58 anonymized hand fracture cases was conducted. Clinical details, including fracture site, displacement, and soft-tissue involvement, were provided to the AI models, which generated management plans. Their recommendations were compared to actual surgeon decisions, assessing accuracy, precision, recall, and F1 score. Results: ChatGPT-4o demonstrated the highest accuracy (98.28%) and recall (91.74%), effectively identifying most correct interventions but occasionally proposing extraneous options (precision 58.48%). DeepSeek-V3 showed moderate accuracy (63.79%), with balanced precision (61.17%) and recall (57.89%), sometimes omitting correct treatments. Gemini 1.5 performed poorly (accuracy 18.97%), with low precision and recall, indicating substantial limitations in clinical decision support. Conclusions: AI models can enhance clinical workflows, particularly in radiographic interpretation and triage, but their limitations highlight the irreplaceable role of human expertise in complex hand trauma management. ChatGPT-4o demonstrated promising accuracy but requires refinement. Ethical concerns regarding AI-driven medical decisions, including bias and transparency, must be addressed before widespread clinical implementation.

1. Introduction

Hand trauma represents a critical and highly specialized domain within reconstructive and orthopedic surgery, often demanding precise diagnostic acumen and advanced surgical skills. The clinical management of hand fractures involves restoring skeletal integrity and preserving critical structures such as tendons, nerves, and vasculature, which are essential for functional recovery. Even seemingly straightforward fractures can harbor subtleties that influence surgical decision-making and rehabilitation protocols. In this context, achieving optimal outcomes requires a comprehensive understanding of hand anatomy, fracture mechanics, and the psychosocial impact on patients who depend heavily on manual dexterity for daily activities and employment.
Recent advancements in artificial intelligence (AI) suggest that machine learning algorithms may hold significant potential to support and, in some scenarios, potentially rival traditional approaches to managing hand fractures [1]. Innovations in deep learning have facilitated the automated detection and classification of skeletal injuries in medical imaging, sometimes with diagnostic accuracies comparable to those of experienced clinicians. In the broader fields of plastic and reconstructive surgery, systematic reviews underscore AI’s ability to achieve diagnostic performance on par with human clinicians, highlighting both the rapid computational capabilities of neural networks and the expanding volume of medical imaging data available for algorithmic training [2,3]. However, the question remains whether these emerging technologies truly match the nuanced decision-making, tactile feedback, and operative judgment of an experienced hand surgeon, particularly when confronted with complex scenarios such as multi-fragment fractures, tendon or nerve compromise, and extensive soft-tissue damage.
To explore this question, this retrospective analysis critically compares AI-driven fracture diagnosis and treatment planning from three distinct AI systems, ChatGPT-4o, DeepSeek-V3, and Gemini 1.5, with the real-world expertise of hand surgeons who have managed a range of traumatic injuries [4,5]. On the one hand, meta-analyses and prospective studies in orthopedic and reconstructive specialties provide promising evidence of AI’s capacity to match or exceed human-level accuracy in radiographic interpretation [6,7,8,9]. These tools also offer potential benefits in workflow optimization, such as rapidly triaging high-priority cases and leveraging predictive analytics to forecast complications, for instance, delayed union or infection [10]. On the other hand, treating hand trauma involves a series of intricate clinical judgments that extend beyond the scope of most current AI models.
Crucially, while AI-based approaches can streamline workflow and refine predictive analytics, their integration raises key concerns about algorithmic bias, data quality, and ethical frameworks for patient autonomy and informed consent [11,12,13,14,15,16,17]. In this comparative perspective, we highlight AI’s potential and limitations by juxtaposing specific clinical scenarios ranging from simple metacarpal fractures to complex hand injuries involving microvascular repair across three AI platforms: ChatGPT-4o, DeepSeek-V3, and Gemini 1.5 [18].

2. Materials and Methods

In this retrospective analysis, fifty-eight anonymized cases of hand fractures were identified from our institutional database, encompassing a variety of mechanisms (e.g., punch injuries, sports-related trauma, crush accidents) and diverse patient demographics. Pertinent clinical details characterized each case, including the precise fracture site (metacarpals or phalanges), associated soft-tissue damage (such as nailbed lacerations), the presence or absence of comminution or displacement, and any relevant comorbidities. Ethical approval was not required, as all patients had already undergone clinical management prior to this study. As a result, there was no potential for harm or risk to patients, and all data were fully de-identified to maintain confidentiality. This study was conducted according to the Declaration of Helsinki, with institutional ethical approval obtained from Victorian Ethics Review Management (LNR/97071/PH-2023).
Using these de-identified case profiles, we prompted three AI systems, ChatGPT-4o, DeepSeek-V3, and Gemini 1.5, to propose comprehensive management strategies. These strategies included preoperative evaluation, surgical or conservative treatment options, selection of fixation techniques (such as K-wire, plating, or dynamic external fixation), and any recommended ancillary procedures (such as tendon or nailbed repairs). The AI-generated plans were then compared directly to the definitive management documented by experienced hand surgeons, who employed a range of established operative methods, including the GAMP technique, Suzuki external fixation, and open reduction with internal fixation, as well as adjunct procedures for open fractures or soft-tissue repair. A comparative table was created to compile the fracture details, each AI model’s proposed approach, and the interventions performed for each patient. Discrepancies between AI recommendations and surgeon decisions regarding the choice of hardware, the necessity and timing of debridement, and postoperative immobilization were meticulously recorded. Ethical safeguards were implemented, ensuring that all patient identifiers were removed before review and that confidentiality and compliance with ethical research standards were maintained. The primary aim of this study was to evaluate the extent to which AI-generated treatment pathways align with routine clinical practice in the acute management of hand fractures, thereby assessing the potential of these AI systems to support and enhance surgical decision-making.

3. Results

The analysis of the three AI systems, ChatGPT, DeepSeek, and Gemini, reveals distinct performance profiles based on their accuracy, precision, recall, and F1 score metrics. ChatGPT demonstrates the highest accuracy at 98.28%, coupled with a robust recall of 91.74%, indicating its exceptional ability to capture the majority of the correct codes across cases. Its precision of 58.48% suggests that although it generally identifies correct codes, the multiple options it proposes sometimes include extraneous elements. In contrast, DeepSeek achieves a moderate accuracy of 63.79% with a precision of 61.17%, reflecting a high likelihood that its proposed codes are correct; however, its slightly lower recall of 57.89% indicates that it might overlook some correct codes. Gemini, on the other hand, performs notably lower across all metrics, with an accuracy of 18.97%, precision of 12.60%, recall of 14.68%, and an F1 score of 13.55%, indicating significant challenges in accurately predicting the correct codes. These results suggest that while ChatGPT excels in comprehensive code identification, its tendency to offer multiple options can affect precision, whereas DeepSeek provides more focused predictions at the cost of missing some correct codes and Gemini requires substantial improvements to enhance its predictive capabilities (Table 1, Table 2 and Table 3).

Statistical Analysis

Performance metrics, including accuracy, precision, recall, and F1 score, were calculated according to standard definitions in the literature. These calculations were implemented using custom Python scripts (Python version 3.8 or later) that leveraged the NumPy and scikit-learn libraries for data processing and metric computation. To ensure transparency and reproducibility, the formulas were cross-validated using both automated code verification and manual review. In addition, ChatGPT (version 4o) was employed as an auxiliary tool to generate and verify portions of the computational code, further supporting the robustness of our methodology. This hybrid approach, integrating automated processes with expert oversight, confirmed the accuracy of the obtained values and provided thorough documentation of the entire analytical workflow. All results were subsequently subjected to quality control checks to ensure their consistency and reliability.

4. Discussion

The comparative analysis of fifty-eight surgical cases evaluating three artificial intelligence systems, ChatGPT, DeepSeek, and Gemini, reveals significant differences in their performance metrics, namely accuracy, precision, recall, and F1 score. These findings hold substantial implications for integrating AI-driven decision support tools in clinical settings, particularly in managing hand fractures. ChatGPT demonstrated exceptional accuracy at 98.28% and a notably strong recall of 91.74%, indicating its proficiency in identifying the majority of correct codes. This aligns with broader trends in plastic and reconstructive surgery literature, where AI systems have shown promise in matching or even exceeding human diagnostic and therapeutic accuracy [18,19]. ChatGPT’s ability to propose multiple codes increases the likelihood of including the correct intervention, thereby enhancing recall. However, this tendency also results in a precision of 58.48%, reflecting the inclusion of extra or incorrect options that can lower specificity. This balance underscores ChatGPT’s strength in comprehensive case coverage while highlighting a need for refinement to reduce erroneous predictions further.
DeepSeek exhibited a moderate accuracy of 63.79% and a precision of 61.17%, suggesting that its proposed codes are frequently correct. However, its recall stood at 57.89%, indicating that it may have missed some correct interventions, an aspect that can be critical in complex cases requiring exhaustive management strategies. This performance aligns with findings by Kuo et al., who noted that AI systems can achieve diagnostic and therapeutic accuracy comparable to human clinicians, albeit with variability in precision and comprehensiveness [1]. DeepSeek’s balanced but not robust precision and recall suggest it can serve as a reliable adjunct in clinical decision-making without overwhelming clinicians with excessive options. Nevertheless, its lower recall indicates that certain correct codes might be overlooked, necessitating complementary strategies to ensure broad coverage in clinical practice.
Gemini’s performance was notably lower across all metrics, with an accuracy of 18.97%, a precision of 12.60%, and a recall of 14.68%. This underscores significant challenges in its implementation, suggesting that Gemini struggles to accurately predict correct codes, rendering it less reliable for clinical decision support. Such underperformance mirrors concerns raised by Nogueira et al., who emphasized the risk of algorithmic oversimplification in complex clinical scenarios [2]. The limited effectiveness of Gemini underscores the critical need for advanced training algorithms, larger and more diverse datasets, and potentially different architectures to enhance its predictive capabilities. Gemini may not be suitable for clinical use without substantial improvements, particularly in intricate cases that demand nuanced decision-making.
From a diagnostic standpoint, the results reinforce the literature indicating that AI-based systems like ChatGPT and DeepSeek are capable of effective fracture recognition and risk stratification. Reviews in facial and reconstructive surgery have similarly documented AI’s strengths in rapid imaging analysis and predictive modeling (Espinosa Reyes et al. [3], Souza et al. [5]). In the context of hand trauma, these capabilities may translate into improved triage, expedited operative decision-making, and more targeted postoperative surveillance. However, the application of AI to complex upper-limb fractures remains challenging. Cases involving fragment comminution, concomitant tendon injuries, or crush components demand operative flexibility that current AI systems cannot yet fully replicate. Consistent with Maita et al.’s findings in reconstructive procedures [6], this underscores the necessity for tailored algorithms that accommodate a broad range of clinical variables rather than adopting a one-size-fits-all approach.
Ethical considerations remain paramount in the integration of AI into clinical decision-making. Echoing Pressman et al., any expansion of AI must address potential algorithmic biases, data privacy concerns, and the preservation of patient autonomy [17,20,21,22]. Our study, involving a limited sample size and a single AI platform for each system, highlights the necessity for further research with larger datasets and more diverse clinical scenarios to validate these preliminary findings. There is also a pressing need for standardized guidelines, as advocated by Wah and colleagues, who caution that disparities in AI access could exacerbate existing gaps in patient care [18]. Furthermore, the potential for racial or demographic bias, as raised by Haider et al. [21,22,23,24], emphasizes the importance of representative training data to ensure equitable AI performance across different patient populations.
This study has several limitations that must be acknowledged. Firstly, the analysis was based on a relatively small sample size of fifty-eight cases, which may limit the generalizability of the findings. Additionally, the reliance on case reports and the encoding of multiple proposed codes using a “+” delimiter present methodological challenges that could influence the accuracy and precision metrics observed. The retrospective design of the study and the inherent variability in case complexity also add layers of complexity that may affect the performance evaluation of the AI systems. Furthermore, the study utilized one instance of each AI platform, which may not capture the full breadth of capabilities and variations that different implementations of the same systems could exhibit.
Future research should address these limitations to provide a more comprehensive understanding of AI performance in clinical decision support. Larger and more diverse datasets are essential for enhancing the robustness and generalizability of the findings, and prospective studies or randomized controlled trials could offer more definitive evidence of AI efficacy in real-time clinical workflows. Analyzing individual codes for each AI system may also help pinpoint specific strengths and weaknesses, enabling more targeted improvements. Leveraging metrics designed explicitly for multi-label classification—such as Hamming Loss, Jaccard Index, and Subset Accuracy—would afford a more nuanced view of AI performance. Developing and integrating tailored algorithms that reflect the intricate nature of clinical cases will be crucial for AI systems to manage complex upper-limb fractures effectively. Addressing potential algorithmic biases and ensuring representative training data remain priorities for promoting equitable AI performance, as emphasized by Haider et al. [21]. Assessing the consistency and reliability of different AI systems using metrics like Fleiss’ Kappa or pairwise Cohen’s Kappa can further clarify inter-system agreement.

5. Conclusions

This study highlights the varying effectiveness of AI-driven decision support tools in managing hand fractures. ChatGPT emerges as a highly accurate system with strong recall, offering comprehensive coverage of correct interventions at the expense of occasionally proposing superfluous codes. DeepSeek strikes a balance between precision and recall but may overlook some interventions, while Gemini’s significantly lower performance underscores the urgent need for substantial algorithmic and data-driven improvements. These findings contribute to the evolving understanding of AI’s potential in clinical decision-making, demonstrating that while AI systems can enhance care, their integration into routine clinical workflows must be approached with a careful assessment of their strengths and limitations. By addressing algorithmic biases, refining precision without undermining recall, and conducting rigorous validation studies, AI in healthcare can progress toward fulfilling its promise of improved patient outcomes, more efficient workflows, and an elevated standard of personalized medicine in the domain of hand trauma and beyond.

Author Contributions

Conceptualization, G.M., I.S. and W.M.R.; methodology, G.M., I.S. and Y.X.; software, M.P.; validation, I.S., Y.X. and W.M.R.; formal analysis, P.S. and Y.X.; investigation, G.M. and R.C.; resources, W.M.R.; data curation, M.P. and P.S.; writing—original draft preparation, G.M. and I.S.; writing—review and editing, Y.X. and W.M.R.; visualization, M.P.; supervision, W.M.R.; project administration, G.M. and I.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study was conducted according to the Declaration of Helsinki, with institutional ethical approval obtained from the Peninsula Health Human Research Ethics Committee (HREC), Frankston, VIC, Australia (Approval Number: LNR/97071/PH-2023, dated 18 September 2023).

Informed Consent Statement

Patient consent was waived as all patients had already undergone clinical management prior to this study, and all data were fully anonymized. Due to the retrospective nature of the study and the de-identification of all case details, there was no potential for harm or risk to patients. This study was conducted in accordance with the ethical principles outlined in the Declaration of Helsinki and complied with institutional and regulatory guidelines for retrospective analyses.

Data Availability Statement

The data are available upon reasonable request to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest. Ishith Seth serves as the Guest Editor for the Special Issue in which this manuscript is published. However, the editorial process was conducted independently to ensure transparency and integrity.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial Intelligence
LLMLarge Language Model
MCMetacarpal
MFMiddle Finger
LFLittle Finger
RFRing Finger
P1, P2, P3Proximal, Middle, and Distal Phalanges
DIPDistal Interphalangeal Joint
IFIndex Finger
ORIFOpen Reduction and Internal Fixation
CRPPClosed Reduction Percutaneous Pinning
GAMPGuided Anatomic Mini-invasive Procedure
K-wireKirschner Wire
SH IISalter–Harris Type II
MCPJMetacarpophalangeal Joint
VP RepairVolar Plate Repair
NBRNail Bed Repair
DOSDate of Surgery
AFLAustralian Football League

References

  1. Kuo, R.Y.L.; Harrison, C.; Curran, T.A.; Jones, B.; Freethy, A.; Cussons, D.; Stewart, M.; Collins, G.S.; Furniss, D. Artificial Intelligence in Fracture Detection: A Systematic Review and Meta-Analysis. Radiology 2022, 304, 50–62. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  2. Nogueira, R.; Eguchi, M.; Kasmirski, J.; de Lima, B.V.; Dimatos, D.C.; Lima, D.L.; Glatter, R.; Tran, D.L.; Piccinini, P.S. Machine Learning, Deep Learning, Artificial Intelligence and Aesthetic Plastic Surgery: A Qualitative Systematic Review. Aesthetic Plast Surg. 2024; Epub ahead of print. [Google Scholar] [CrossRef] [PubMed]
  3. Espinosa Reyes, J.A.; Puerta Romero, M.; Cobo, R.; Heredia, N.; Solís Ruiz, L.A.; Corredor Zuluaga, D.A. Artificial Intelligence in Facial Plastic and Reconstructive Surgery: A Systematic Review. Facial Plast. Surg. 2024, 40, 615–622. [Google Scholar] [CrossRef] [PubMed]
  4. Stephanian, B.; Karki, S.; Debnath, K.; Saltychev, M.; Rossi-Meyer, M.; Kandathil, C.K.; Most, S.P. Role of Artificial Intelligence and Machine Learning in Facial Aesthetic Surgery: A Systematic Review. Facial Plast. Surg. Aesthet. Med. 2024, 26, 679–705. [Google Scholar] [CrossRef] [PubMed]
  5. Souza, S.; Bhethanabotla, R.M.; Mohan, S. Applications of artificial intelligence in facial plastic and reconstructive surgery: A systematic review. Curr. Opin. Otolaryngol. Head Neck Surg. 2024, 32, 222–233. [Google Scholar] [CrossRef] [PubMed]
  6. Maita, K.C.; Avila, F.R.; Torres-Guzman, R.A.; Garcia, J.P.; De Sario Velasquez, G.D.; Borna, S.; Brown, S.A.; Haider, C.R.; Ho, O.S.; Forte, A.J. The usefulness of artificial intelligence in breast reconstruction: A systematic review. Breast Cancer 2024, 31, 562–571. [Google Scholar] [CrossRef] [PubMed]
  7. Devault-Tousignant, C.; Harvie, M.; Bissada, E.; Christopoulos, A.; Tabet, P.; Guertin, L.; Bahig, H.; Ayad, T. The use of artificial intelligence in reconstructive surgery for head and neck cancer: A systematic review. Eur. Arch. Otorhinolaryngol. 2024, 281, 6057–6068. [Google Scholar] [CrossRef] [PubMed]
  8. Taib, B.G.; Karwath, A.; Wensley, K.; Minku, L.; Gkoutos, G.V.; Moiemen, N. Artificial intelligence in the management and treatment of burns: A systematic review and meta-analyses. J. Plast. Reconstr. Aesthet. Surg. 2023, 77, 133–161. [Google Scholar] [CrossRef] [PubMed]
  9. Myers, T.G.; Ramkumar, P.N.; Ricciardi, B.F.; Urish, K.L.; Kipper, J.; Ketonis, C. Artificial Intelligence and Orthopaedics: An Introduction for Clinicians. J. Bone Jt. Surg. Am. 2020, 102, 830–840. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  10. Miragall, M.F.; Knoedler, S.; Kauke-Navarro, M.; Saadoun, R.; Grabenhorst, A.; Grill, F.D.; Ritschl, L.M.; Fichter, A.M.; Safi, A.F.; Knoedler, L. Face the Future-Artificial Intelligence in Oral and Maxillofacial Surgery. J. Clin. Med. 2023, 12, 6843. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  11. Tan, S.; Xin, X.; Wu, D. ChatGPT in medicine: Prospects and challenges: A review article. Int. J. Surg. 2024, 110, 3701–3706. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  12. Oppikofer, C. Artificial Intelligence in Aesthetic Surgery Publishing. Aesthet. Surg. J. 2024, 44, 779–782. [Google Scholar] [CrossRef] [PubMed]
  13. Dhawan, R.; Brooks, K.D. Limitations of Artificial Intelligence in Plastic Surgery. Aesthet. Surg. J. 2024, 44, NP323–NP324. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  14. TerKonda, S.P.; TerKonda, A.A.; Sacks, J.M.; Kinney, B.M.; Gurtner, G.C.; Nachbar, J.M.; Reddy, S.K.; Jeffers, L.L. Artificial Intelligence: Singularity Approaches. Plast. Reconstr. Surg. 2024, 153, 204e–217e. [Google Scholar] [CrossRef] [PubMed]
  15. Bhandari, P.L.; Drolet, B.C.; James, A.J.; Lineaweaver, W.C. Artificial Intelligence and Submissions to Annals of Plastic Surgery. Ann. Plast. Surg. 2024, 92, 487–488. [Google Scholar] [CrossRef] [PubMed]
  16. Park, K.W.; Diop, M.; Willens, S.H.; Pepper, J.P. Artificial Intelligence in Facial Plastics and Reconstructive Surgery. Otolaryngol. Clin. N. Am. 2024, 57, 843–852. [Google Scholar] [CrossRef] [PubMed]
  17. Pressman, S.M.; Borna, S.; Gomez-Cabello, C.A.; Haider, S.A.; Haider, C.; Forte, A.J. AI and Ethics: A Systematic Review of the Ethical Considerations of Large Language Model Use in Surgery Research. Healthcare 2024, 12, 825. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  18. Wah, J.N.K. Revolutionizing surgery: AI and robotics for precision, risk reduction, and innovation. J. Robot. Surg. 2025, 19, 47. [Google Scholar] [CrossRef] [PubMed]
  19. Abi-Rafeh, J.; Xu, H.H.; Kazan, R.; Tevlin, R.; Furnas, H. Large Language Models and Artificial Intelligence: A Primer for Plastic Surgeons on the Demonstrated and Potential Applications, Promises, and Limitations of ChatGPT. Aesthet. Surg. J. 2024, 44, 329–343. [Google Scholar] [CrossRef] [PubMed]
  20. Kapila, A.K.; Georgiou, L.; Hamdi, M. Decoding the Impact of AI on Microsurgery: Systematic Review and Classification of Six Subdomains for Future Development. Plast. Reconstr. Surg. Glob. Open 2024, 12, e6323. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  21. Haider, S.A.; Borna, S.; Gomez-Cabello, C.A.; Pressman, S.M.; Haider, C.R.; Forte, A.J. The Algorithmic Divide: A Systematic Review on AI-Driven Racial Disparities in Healthcare. J. Racial Ethn. Health Disparities, 2024; Epub ahead of print. [Google Scholar] [CrossRef] [PubMed]
  22. Schopow, N.; Osterhoff, G.; Baur, D. Applications of the Natural Language Processing Tool ChatGPT in Clinical Practice: Comparative Study and Augmented Systematic Review. JMIR Med. Inform. 2023, 11, e48933. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  23. Marcaccini, G.; Seth, I.; Cuomo, R. Letter on: “Artificial Intelligence: Enhancing Scientific Presentations in Aesthetic Surgery”. Aesthetic Plast Surg. 2024; Epub ahead of print. [Google Scholar] [CrossRef] [PubMed]
  24. Lin, S.J.; Sun, C.Y.; Chen, D.N.; Kang, Y.N.; Hoang, K.D.; Chen, K.H.; Chen, C. Chatbots for breast cancer education: A systematic review and meta-analysis. Support Care Cancer 2024, 33, 55. [Google Scholar] [CrossRef] [PubMed]
Table 1. Patients’ characteristics and their management advice by surgeons and large language models.
Table 1. Patients’ characteristics and their management advice by surgeons and large language models.
CaseKey DetailsProposed ChatGPT
Approach
Proposed DeepSeek
Approach
Proposed Gemini ApproachSurgeon Approach
148 M, punched TV, right 5th MC + little finger malletClosed or open reduction with K-wire if unstable, DIP extension for mallet injuryORIF (Open Reduction Internal Fixation) with volar plating for 5th metacarpal fracture, percutaneous K-wire or screw fixation for bony mallet fingerSplint in slight flexion, closed reduction if significantly displaced, buddy taping for mallet fingerK-wire fixation of 5th MC and mallet finger
254 M, Punched dog, right MF and RF mallet fingersClosed reduction if small bony fragment, K-wire if unstable + DIP splintPercutaneous K-wire or screw fixation for displaced bony mallet fractures (MF/RF), splinting if non-displacedBuddy taping for both mallet fingersK-wire fixation for both mallet fingers
314 M, basketball, right MF bony malletClosed reduction if the fragment is small, K-wire if unstable, DIP in extensionExtension splinting (6–8 weeks) for non-displaced pediatric bony mallet, percutaneous pinning if displacedSplint in slight flexion, consider surgical repair if significantly displacedK-wire fixation of bony mallet fragment + DIP extension splint
421 M, punched wall, right 4th MC fractureAssess displacement, if unstable or significantly angulated, ORIF with plate/screwsCRPP (Closed Reduction Percutaneous Pinning) or ORIF with plates/screws for unstable 4th metacarpal fractureSplint in slight flexion, closed reduction if significantly displacedOpen reduction and internal fixation (plate and screws) for 4th MC
510 M, footy injury, RLF P1 head fractureClosed reduction or mini-open approach if needed, K-wire fixation if instability, immobilize in extensionCRPP for displaced P1 head fracture, buddy taping for stable, non-displaced fractures;
nailbed repair under magnification + K-wire fixation for P3 fracture
Splint in slight flexion, closed reduction if significantly displacedGAMP technique + K-wire in P1 head
669 M, boat vs. LIF crush, LIF P3 fracture + nailbedDebridement and nailbed repair, K-wire for distal phalanx if unstableNailbed repair under magnification + K-wire fixation for P3 fractureSplint in extension, nailbed repair if necessaryNailbed repair + K-wire fixation of P3
745 M, cricket ball, left RF P2 pilon + P3 malletConsider dynamic external fixation (Suzuki) for pilon fracture, K-wire for malletORIF for P2 pilon fracture (mini-fragment screws), splinting or pinning for P3 malletSplint in extension, consider surgical repair for pilon fractureSuzuki + Ishiguru K-wire fixation for pilon (P2) and mallet (P3)
87 M, car door, left MF P2/P3 fractureClosed or open reduction if needed, K-wire stabilization, possible GAMP techniqueCRPP for displaced pediatric P2/P3 fracturesSplint in slight flexion, closed reduction if significantly displacedGAMP + K-wire fixation of P2/P3 MF fractures
953 F, dog bite, right IF P1 fracture (open)Irrigation/debridement due to bite, ORIF if unstable, antibiotics coverageUrgent irrigation/debridement + ORIF (plates/screws) for open P1 fractureIrrigation and debridement, antibiotics, tetanus prophylaxis, splint in slight flexion, potential for surgical explorationDebridement + ORIF of P1, likely with antibiotics
1049 M, wardrobe door, left 5th MC neck (open)Washout closed/open reduction, K-wire if angulated/unstableDebridement + ORIF with plates/screws for open 5th metacarpal neck fractureIrrigation and debridement, antibiotics, tetanus prophylaxis, splint in slight flexion, potential for surgical explorationWashout + K-wire fixation (two cross K-wires) of open 5th MC neck
119 M, basketball, left LF P1 fractureClosed or open reduction if displaced, GAMP or K-wire fixation, splintingCRPP for displaced pediatric P1 fractureSplint in slight flexion, closed reduction if significantly displacedGAMP procedure for P1 fracture, likely with protected mobilization
1269 M, fall from electric scooter, left 3rd and 4th MC fracturesIf significantly displaced, ORIF with plates/screws, plus post-op rehabORIF with plates/screws for multiple metacarpal fracturesSplint in slight flexion, closed reduction if significantly displaced, consider surgical fixation if multiple fracturesOpen reduction and plating of 3rd and 4th metacarpals
1332 F, hit fridge, right RF P1 fractureOpen reduction if unstable, miniplate or screws fixation, short splint post-opORIF (mini-fragment screws) for unstable P1 fractureSplint in slight flexion, closed reduction if significantly displacedORIF (plate and screws or small fragment system) on right RF P1
1432 M, punched steel beam, right RF and LF MC fracturesClosed or open reduction depending on displacement, GAMP or K-wire if necessaryORIF for displaced RF/LF metacarpal fracturesSplint in slight flexion, closed reduction if significantly displaced, consider surgical fixation if multiple fracturesGAMP + K-wire fixation of 4th and 5th MC fractures
1516 M, punch, right 5th MC fractureAssess displacement, if closed reduction fails, ORIF with plate and screwsCRPP or ORIF (based on angulation >30°)Splint in slight flexion, closed reduction if significantly displacedORIF (plate and screws) on 5th MC + standard rehab protocol
1635 M, AFL injury, left RF P1 fractureOpen reduction if significantly displaced, stable fixation with miniplate or screwsORIF with mini-fragment screws for unstable P1 fractureSplint in slight flexion, closed reduction if significantly displacedORIF (plate and screws) on left RF P1 + guided physiotherapy
1747 F, punch, left 5th MC fractureCheck the degree of angulation, if unstable then ORIF, follow-up X-raysORIF with volar plating for 5th metacarpal fractureSplint in slight flexion, closed reduction if significantly displacedORIF (plate and screws) on 5th MC + standard post-op rehab
1832 M, Futsal ball, right 1st MC fractureIf unstable base/shaft fracture, open reduction and fix with plate or K-wire, then splintORIF (Bennett’s fracture protocol: lag screws or K-wires)Splint in slight flexion, closed reduction if significantly displacedORIF + additional K-wire for the stability of the 1st MC
196 M, football, DOS 25/4, left RF P1 neckClosed/mini-open reduction, K-wire if unstable, protect growth plateCRPP for pediatric P1 neck fractureSplint in slight flexion, closed reduction if significantly displacedLeft RF P1 neck GAMP
2022 M, punch, DOS 28/4, right IF MC fractureClosed or open reduction if unstable, GAMP + K-wire, short immobilizationORIF with plates/screws for unstable metacarpal fractureSplint in slight flexion, closed reduction if significantly displacedRight IF MC GAMP + K-wire
2119 F, AFL fist vs. ground, DOS 1/5, Right RF MC neckClosed/open reduction for displacement, GAMP + K-wire for MC neckCRPP or ORIF for metacarpal neck fracture (based on rotation/angulation)Splint in slight flexion, closed reduction if significantly displacedRight RF MC neck GAMP + K-wire
2214 F, AFL, DOS 1/5, right LF P1 radial condyleOpen reduction to restore joint, small screws/plate fixationORIF with mini-screws for radial condyle fractureSplint in appropriate position, consider further imaging (MRI) if neurovascular compromiseORIF radial condyle of LF P1
2315 M, football, DOS 1/5, right LF P1 fractureCheck displacement, GAMP for P1, protect growth plate, short immobilizationCRPP for pediatric P1 fractureSplint in slight flexion, closed reduction if significantly displacedRight LF P1 GAMP
2476 M, FOOSH, DOS 3/5, left thumb P3 fractureOpen/closed reduction if unstable, K-wire, splint afterwardsSplinting for non-displaced P3, K-wire fixation if displacedSplint in extension, consider surgical repair if significantly displacedLeft thumb P3 open reduction + K-wire
259 F, FOOSH, DOS 7/5, left LF MC fractureCheck growth plate, if displaced open reduction + K-wire, short castCRPP for pediatric metacarpal fractureSplint in slight flexion, closed reduction if significantly displacedLeft LF MC open reduction + K-wire
2622 F, FOOSH, DOS 7/5, left thumb MC baseIf the base is unstable (e.g., Bennett-type), open reduction + plate/screwsORIF (e.g., lag screws for Rolando/Bennett’s fractures)Splint in appropriate position, consider surgical fixation if unstableLeft thumb MC base ORIF
2717 M, basketball, DOS 25/4, right thumb Bennett’sOpen/closed reduction for Bennett’s fracture, GAMP or plate, K-wire if unstableORIF with screws/K-wires for Bennett’s fractureClosed reduction and percutaneous pinningRight thumb Bennett’s GAMP + K-wire
289 M, AFL, DOS 11/5, right ring finger P2 neckIf significantly angulated, GAMP fixation, protect growth plateCRPP for pediatric P2 neck fractureSplint in slight flexion, closed reduction if significantly displacedRight ring finger P2 neck GAMP
2933 F, FOOSH, DOS 11/5, left RF fractureOpen reduction if unstable, plate/screws or K-wire, short splintORIF for unstable RF fractureSplint in slight flexion, closed reduction if significantly displacedLeft RF ORIF (plate/screws)
3059 M, AFL, DOS 11/5, right LF central slip + volar P2Repair central slip, fix volar P2 if needed (K-wire), open approachCentral slip tendon repair + ORIF for volar P2 fractureSplint in extension, consider surgical repairRight LF central slip repair + K-wire volar P2
3123 M, punched fridge, DOS 15/5, left 5th MC neckCheck angulation, GAMP or K-wire if unstable, short immobilizationORIF with plates/screws for 5th metacarpal neck fractureSplint in slight flexion, closed reduction if significantly displacedLeft 5th MC neck GAMP
3233 M, crush steel, DOS 18/5, right IF P2 fractureWashout if open, K-wire if unstable fragment, consider antibioticsORIF for unstable P2 fractureSplint in slight flexion, closed reduction if significantly displacedRight IF P2 washout + K-wire
3317 F, AFL, DOS 18/5, left IF MCPJ dislocationOpen reduction of MCPJ, volar plate repair (Kaplan’s lesion), possible pinClosed reduction ± K-wire fixation for MCPJ dislocationClosed reduction under anesthesia, splint in appropriate positionLeft IF MCPJ dislocation reduction + open volar plate repair
3426 M, AFL, DOS 22/5, left IF P1 fractureIf displaced or articular, open reduction + screws/plate, short immobilizationORIF for displaced P1 fractureSplint in slight flexion, closed reduction if significantly displacedLeft IF P1 ORIF
3559 M, circular saw, DOS 25/5, left MF nailbed + P3Debridement, nailbed repair, K-wire distal phalanx if unstable, antibioticsNailbed repair + P3 K-wire fixationNailbed repair, splint in extensionLeft MF nailbed repair + K-wire P3
3625 M, AFL, DOS 29/5, left LF P1If the fracture is unstable, ORIF with miniplate/screws, short immobilizationORIF with mini-fragment screws for P1 fractureSplint in slight flexion, closed reduction if significantly displacedLeft LF P1 ORIF
3726 F, dodgeball, DOS 29/5, right LF P1Open reduction if needed, stable fixation (plate/screws), short splintORIF for displaced P1 fractureSplint in slight flexion, closed reduction if significantly displacedRight LF P1 ORIF with rigid fixation
3811 M, basketball, DOS 30/5, left LF P2 SH IIProtect growth plate, closed/mini-open reduction, GAMP or K-wireCRPP for Salter–Harris II fractureSplint in extension, consider surgical repairLeft LF P2 SH II GAMP
3914 M, AFL, DOS 31/5, right MF malletUnstable mallet or large fragment → K-wire, else DIP splintExtension splinting ± percutaneous pinning for bony malletBuddy taping to adjacent fingerRight MF mallet K-wire fixation
4036 F, crush injury, DOS 3/6, left LF P3Closed/mini-open reduction if needed, K-wire for unstable distal fragmentK-wire fixation or splinting for P3 fractureSplint in extension, consider surgical repair if significantly displacedLeft LF P3 K-wire fixation
4119 M, AFL, DOS 4/6, left 5th MC fractureAssess displacement, if significant, ORIF w/plate/screws, short immobilizationORIF for 5th metacarpal fractureSplint in slight flexion, closed reduction if significantly displacedLeft 5th MC ORIF (plate and screws)
426 M, AFL, DOS 5/6, left RF fractureOpen/closed reduction for phalanx, K-wire if unstable, protect growth plateCRPP for pediatric RF fractureSplint in slight flexion, closed reduction if significantly displacedOpen reduction + K-wires on left ring finger
4330 F, fall, DOS 5/6, left 5th MCIf the metacarpal neck/shaft is displaced, GAMP or K-wire/plate, short immobilizationORIF for unstable 5th metacarpal fractureSplint in slight flexion, closed reduction if significantly displacedGAMP fixation of left 5th metacarpal
4463 F, fall, DOS 17/6, left thumb P2Open reduction if articular, fix w/mini-screws/plate, short splintORIF for thumb P2 fractureSplint in appropriate position, consider surgical repair if significantly displacedLeft thumb P2 ORIF w/hardware
4539F, 8/6, dog lead, DOS 17/6, right RF P1Open reduction for displaced P1, small plate or screws, buddy tapingORIF for unstable P1 fractureSplint in slight flexion, closed reduction if significantly displacedRight RF P1 ORIF (plate/screws)
4630 F, AFL (kicked), DOS 18/6, right thumb P1Open reduction of proximal phalanx, plate/screws, short thumb splintORIF for thumb P1 fractureSplint in appropriate position, consider surgical repair if significantly displacedRight thumb P1 ORIF
4761 F, fall from horse, DOS 21/6, left hamate + capitateCarpal fractures may need open reduction or GAMP, K-wire or mini-screws okORIF for carpal fractures (hamate + capitate)Splint in slight flexion, consider surgical fixationLeft hamate + capitate GAMP + K-wire
4836 M, crush injury, DOS 21/6, right 5th MCCheck displacement, if unstable, GAMP or K-wire, short immobilizationORIF for 5th metacarpal fractureSplint in slight flexion, closed reduction if significantly displacedRight 5th MC GAMP + K-wire
4928 M, basketball, DOS 23/6, left LF P3 baseIf large or joint-involved fragment, GAMP or K-wire, DIP splintK-wire fixation for P3 base fractureSplint in extension, consider surgical repair if significantly displacedLeft LF P3 base GAMP + K-wire
5059 F, softball fall, DOS 24/6, left MF open malletOpen mallet: nailbed repair if needed, K-wire DIP, watch for infectionDebridement + K-wire fixation for open mallet injurySplint in extension, surgical repair of mallet fingerLeft MF NBR + K-wire for open mallet
5128 M, AFL, DOS 3/7, left LF PIPJ fracture/dislocationOpen reduction to restore joint, K-wire if unstable, volar plate repairORIF ± external fixation for PIPJ fracture-dislocationClosed reduction under anesthesia, splint in appropriate positionLeft LF PIPJ open reduction + K-wire + VP repair
5272 M, cricket ball, DOS 3/7, right MF P3 fractureMallet-type or distal phalanx fracture → K-wire if unstable, DIP splint otherwiseSplinting for non-displaced P3 fractureSplint in extension, consider surgical repair if significantly displacedRight MF P3 K-wire fixation
5323 M, basketball, DOS 10/7, right 5th MCIf significantly unstable, open reduction w/plate/screws or GAMP, short splintORIF for 5th metacarpal fractureSplint in slight flexion, closed reduction if significantly displacedRight 5th MC ORIF w/hardware
5428 M, axial load, DOS 12/7, left IF P1 fractureOpen reduction if displaced/intra-articular, mini-plate or screws, early mobilizationORIF for unstable P1 fractureSplint in slight flexion, closed reduction if significantly displacedLeft IF P1 ORIF (plate/screws)
5514 M, basketball, DOS 28/7, right RF distal injuryNailbed repair or debridement if open, K-wire for unstable distal phalanx, splint DIPCRPP for distal phalangeal injuryX-ray to assess extent of injury, splint or cast depending on findingsRight RF NBR + P3 K-wire
5664 M, cricket ball, DOS 25/7, left MF PIPJ dislocationRe-open if residual subluxation, volar plate repair, possible K-wire across PIPJClosed reduction ± K-wire fixation for PIPJ dislocationClosed reduction under anesthesia, splint in extensionRe-open reduction of left MF PIPJ + VP repair
5715 M, basketball, DOS 24/7, right MF P2 fractureOpen reduction if displaced/intra-articular, small plate/screws, short immobilizationORIF for displaced P2 fractureSplint in slight flexion, closed reduction if significantly displacedRight MF P2 ORIF
5830 M, punched TV, DOS 19/7, right
4th and 5th MC + hamate
Check 4th/5th MC alignment + hamate, K-wire or GAMP if unstable, short castORIF for 4th/5th metacarpal and hamate fracturesSplint in slight flexion, consider surgical fixation for multiple fracturesRight 4th + 5th GAMP + hamate K-wire
Abbreviation: MC = Metacarpal; MF = Middle Finger; LF = Left Finger; RF = Right Finger; P1, P2, P3 = Proximal, Middle, and Distal Phalanges; DIP = Distal Interphalangeal; IF = Interphalangeal; ORIF = Open Reduction Internal Fixation; CRPP = Closed Reduction Percutaneous Pinning; GAMP = specific surgical technique for fracture stabilization; K-wire = Kirschner Wire; SH II = Salter–Harris Type II; MCPJ = Metacarpophalangeal Joint; VP repair = Volar Plate Repair; NBR = Nail Bed Repair; DOS = Date of Surgery; AFL = Australian Football League.
Table 2. Coded interventions.
Table 2. Coded interventions.
CaseProposed Code by ChatGPTProposed Code by DeepSeekProposed Code by GeminiSurgeon Approach
11 + 2 + 3 + 52 + 4 + 36 + 1 + 73
21 + 3 + 53 + 573
31 + 3 + 55 + 36 + 23 + 5
42 + 49 + 2 + 46 + 12 + 4
51 + 2 + 3 + 59 + 710 + 38 + 3
611 + 10 + 310 + 35 + 1010 + 3
717 + 32 + 4 + 5 + 35 + 2 + 417 + 3
81 + 2 + 3 + 896 + 18 + 3
911 + 2 + 4 + 1211 + 2 + 411 + 12 + 13 + 6 + 211 + 2 + 4 + 12
1011 + 1 + 2 + 311 + 2 + 411 + 12 + 13 + 6 + 211 + 3
111 + 2 + 8 + 396 + 18
122 + 4 + 162 + 46 + 1 + 2 + 42 + 4
132 + 4 + 62 + 46 + 12 + 4
141 + 2 + 8 + 32 + 46 + 18 + 3
151 + 2 + 49 + 2 + 46 + 12 + 4 + 16
162 + 42 + 46 + 12 + 4 + 16
172 + 42 + 46 + 12 + 4 + 16
182 + 4 + 32 + 46 + 12 + 4 + 3
191 + 2 + 396 + 18
201 + 2 + 8 + 32 + 46 + 18 + 3
211 + 2 + 8 + 39 + 2 + 46 + 18 + 3
222 + 42 + 46 + 12 + 4
231 + 896 + 18
241 + 2 + 3 + 53 + 55 + 22 + 3
252 + 396 + 12 + 3
262 + 42 + 42 + 42 + 4
271 + 2 + 4 + 8 + 32 + 4 + 31 + 38 + 3
281 + 896 + 18
292 + 4 + 32 + 46 + 12 + 4
3014 + 3 + 214 + 2 + 45 + 214 + 3
311 + 8 + 32 + 46 + 18
3211 + 3 + 122 + 46 + 111 + 3
332 + 151 + 31 + 52 + 15
342 + 42 + 46 + 12 + 4
3511 + 10 + 3 + 1210 + 310 + 510 + 3
362 + 42 + 46 + 12 + 4
372 + 42 + 46 + 12 + 4
381 + 2 + 8 + 395 + 28
393 + 55 + 373
401 + 2 + 33 + 55 + 23
412 + 42 + 46 + 12 + 4
421 + 2 + 396 + 12 + 3
438 + 3 + 42 + 46 + 18
442 + 42 + 45 + 2 + 42 + 4
452 + 4 + 72 + 46 + 12 + 4
462 + 42 + 46 + 12 + 4
472 + 8 + 3 + 42 + 46 + 2 + 48 + 3
481 + 2 + 8 + 32 + 46 + 18 + 3
498 + 3 + 535 + 28 + 3
5010 + 311 + 35 + 2 + 310 + 3
512 + 3 + 152 + 4 + 171 + 52 + 3 + 15
523 + 555 + 23
532 + 4 + 8 + 52 + 46 + 12 + 4
542 + 4 + 162 + 46 + 12 + 4
5510 + 11 + 3 + 595?10 + 3
562 + 15 + 31 + 31 + 52 + 15
572 + 42 + 46 + 12 + 4
583 + 82 + 46 + 1 + 2 + 48 + 3
This table includes four columns. The “Case” column is a sequential identifier from 1 to 58. The “Proposed Code” is a numeric code representing the treatment approach recommended by the AI or protocol: 1 corresponds to “Closed reduction,” 2 to “Open reduction,” 3 to “K-wire fixation,” 4 to “ORIF (plate/screws),” 5 to “Splint in extension,” 6 to “Splint in flexion,” 7 to “Buddy taping,” 8 to “GAMP technique,” 9 to “CRPP (Closed Reduction Percutaneous Pinning),” 10 to “Nailbed repair,” 11 to “Debridement/Irrigation (washout),” 12 to “Antibiotic coverage,” 13 to “Tetanus prophylaxis,” 14 to “Central slip tendon repair,” 15 to “Volar plate repair,” 16 to “Physiotherapy/Rehabilitation,” and 17 to “External fixation (e.g., Suzuki frame). The “Surgeon approach” column uses the same coding scheme to indicate the procedure that was actually performed.
Table 3. Statistical analysis.
Table 3. Statistical analysis.
AIAccuracy (%)Precision (%)Recall (%)F1 Score (%)
ChatGPT98.2858.4892.1471.38
DeepSeek64.1961.1758.2959.52
Gemini19.3713.0015.0813.55
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Marcaccini, G.; Seth, I.; Xie, Y.; Susini, P.; Pozzi, M.; Cuomo, R.; Rozen, W.M. Breaking Bones, Breaking Barriers: ChatGPT, DeepSeek, and Gemini in Hand Fracture Management. J. Clin. Med. 2025, 14, 1983. https://doi.org/10.3390/jcm14061983

AMA Style

Marcaccini G, Seth I, Xie Y, Susini P, Pozzi M, Cuomo R, Rozen WM. Breaking Bones, Breaking Barriers: ChatGPT, DeepSeek, and Gemini in Hand Fracture Management. Journal of Clinical Medicine. 2025; 14(6):1983. https://doi.org/10.3390/jcm14061983

Chicago/Turabian Style

Marcaccini, Gianluca, Ishith Seth, Yi Xie, Pietro Susini, Mirco Pozzi, Roberto Cuomo, and Warren M. Rozen. 2025. "Breaking Bones, Breaking Barriers: ChatGPT, DeepSeek, and Gemini in Hand Fracture Management" Journal of Clinical Medicine 14, no. 6: 1983. https://doi.org/10.3390/jcm14061983

APA Style

Marcaccini, G., Seth, I., Xie, Y., Susini, P., Pozzi, M., Cuomo, R., & Rozen, W. M. (2025). Breaking Bones, Breaking Barriers: ChatGPT, DeepSeek, and Gemini in Hand Fracture Management. Journal of Clinical Medicine, 14(6), 1983. https://doi.org/10.3390/jcm14061983

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop