- Article
Large Language Models vs. Professional Resources for Post-Treatment Quality-of-Life Questions in Head and Neck Cancer: A Cross-Sectional Comparison
- Ali Alabdalhussein,
- Mohammed Hasan Al-Khafaji and
- Shazaan Nadeem
- + 10 authors
Background: Recently, patients have been using large language models (LLMs) such as ChatGPT, Gemini, and Claude to address their concerns. However, it remains unclear whether the readability, understandability, actionability, and empathy meet the standard guidelines. In this study, we aim to address these concerns and compare the outcomes of the LLMS to those of professional resources. Methods: We conducted a comparative cross-sectional study by following the relevant items of the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) checklist for cross-sectional studies and using 14 patient-style questions. These questions were collected from the professional platforms to represent each domain. We derived the 14 domains from validated quality-of-life instruments (EORTC QLQ-H&N35, UW-QOL, and FACT-H&N). Fourteen Responses were obtained from three LLMs (ChatGPT-4o, Gemini 2.5 Pro, and Claude Sonnet 4) and two professional sources (Macmillan Cancer Support and CURE Today). All responses were evaluated using the Patient Education Materials Assessment Tool (PEMAT), DISCERN instrument, and the Empathic Communication Coding System (ECCS). Readability was assessed using the Flesch Reading Ease and Flesch-Kincaid Grade Level metrics. Statistical analysis included one-way ANOVA and Tukey’s HSD test for group comparisons. Results: No differences were found in quality (DISCERN), understandability, actionability (PEMAT), and empathy (ECCS) between LLMS and professional resources. However, professional resources outperform the LLMs in readability. Conclusions: In our study, we found that LLMs (ChatGPT, Gemini, Claude) can produce patient information that is comparable to professional resources in terms of quality, understandability, actionability, and empathy. However, readability remains a key limitation, as LLM-generated responses often require simplification to align with recommended health-literacy standards.
28 November 2025








