Performance of ChatGPT in Pediatric Audiology as Rated by Students and Experts
Abstract
:1. Introduction
2. Materials and Methods
Statistical Analysis
3. Results
3.1. General Overview of ChatGPT’s Response Ratings
3.2. Question-Specific Analysis of ChatGPT’s Response Ratings
3.3. Assessment of the Usefulness of ChatGPT
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Bundorf, M.K.; Wagner, T.H.; Singer, S.J.; Baker, L.C. Who Searches the Internet for Health Information? Health Serv. Res. 2006, 41, 819–836. [Google Scholar] [CrossRef]
- Maon, S.N.; Hassan, N.M.; Seman, S.A.A. Online Health Information Seeking Behavior Pattern. Adv. Sci. Lett. 2017, 23, 10582–10585. [Google Scholar] [CrossRef]
- Powell, J.; Inglis, N.; Ronnie, J.; Large, S. The Characteristics and Motivations of Online Health Information Seekers: Cross-Sectional Survey and Qualitative Interview Study. J. Med. Internet Res. 2011, 13, e1600. [Google Scholar] [CrossRef] [PubMed]
- Lagoe, C.; Atkin, D. Health Anxiety in the Digital Age: An Exploration of Psychological Determinants of Online Health Information Seeking. Comput. Hum. Behav. 2015, 52, 484–491. [Google Scholar] [CrossRef]
- Ghahramani, F.; Wang, J. Impact of Smartphones on Quality of Life: A Health Information Behavior Perspective. Inf. Syst. Front. 2020, 22, 1275–1290. [Google Scholar] [CrossRef]
- Nielsen, J.P.S.; von Buchwald, C.; Grønhøj, C. Validity of the Large Language Model ChatGPT (GPT4) as a Patient Information Source in Otolaryngology by a Variety of Doctors in a Tertiary Otorhinolaryngology Department. Acta Oto-Laryngol. 2023, 143, 779–782. [Google Scholar] [CrossRef]
- Gilson, A.; Safranek, C.W.; Huang, T.; Socrates, V.; Chi, L.; Taylor, R.A.; Chartash, D. How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment. JMIR Med. Educ. 2023, 9, e45312. [Google Scholar] [CrossRef]
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. 2018. Available online: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf (accessed on 1 November 2024).
- Number of ChatGPT Users (Apr 2024). Available online: https://explodingtopics.com/blog/chatgpt-users (accessed on 4 April 2024).
- Biswas, S.S. Role of Chat GPT in Public Health. Ann. Biomed. Eng. 2023, 51, 868–869. [Google Scholar] [CrossRef]
- Pawar, V.V.; Farooqui, S. Ethical Consideration for Implementing AI in Healthcare: A Chat GPT Perspective. Oral. Oncol. 2024, 149, 106682. [Google Scholar] [CrossRef]
- Luykx, J.J.; Gerritse, F.; Habets, P.C.; Vinkers, C.H. The Performance of ChatGPT in Generating Answers to Clinical Questions in Psychiatry: A Two-Layer Assessment. World Psychiatry 2023, 22, 479–480. [Google Scholar] [CrossRef]
- Lewandowski, M.; Łukowicz, P.; Świetlik, D.; Barańska-Rybak, W. ChatGPT-3.5 and ChatGPT-4 Dermatological Knowledge Level Based on the Specialty Certificate Examination in Dermatology. Clin. Exp. Dermatol. 2024, 49, 686–691. [Google Scholar] [CrossRef] [PubMed]
- Samaan, J.S.; Rajeev, N.; Ng, W.H.; Srinivasan, N.; Busam, J.A.; Yeo, Y.H.; Samakar, K. ChatGPT as a Source of Information for Bariatric Surgery Patients: A Comparative Analysis of Accuracy and Comprehensiveness Between GPT-4 and GPT-3.5. Obes. Surg. 2024, 34, 1987–1989. [Google Scholar] [CrossRef] [PubMed]
- Zaidat, B.; Shrestha, N.; Rosenberg, A.M.; Ahmed, W.; Rajjoub, R.; Hoang, T.; Mejia, M.R.; Duey, A.H.; Tang, J.E.; Kim, J.S.; et al. Performance of a Large Language Model in the Generation of Clinical Guidelines for Antibiotic Prophylaxis in Spine Surgery. Neurospine 2024, 21, 128–146. [Google Scholar] [CrossRef] [PubMed]
- Emile, S.H.; Horesh, N.; Freund, M.; Pellino, G.; Oliveira, L.; Wignakumar, A.; Wexner, S.D. How Appropriate Are Answers of Online Chat-Based Artificial Intelligence (ChatGPT) to Common Questions on Colon Cancer? Surgery 2023, 174, 1273–1275. [Google Scholar] [CrossRef]
- Maida, E.; Moccia, M.; Palladino, R.; Borriello, G.; Affinito, G.; Clerico, M.; Repice, A.M.; Di Sapio, A.; Iodice, R.; Spiezia, A.L.; et al. ChatGPT vs. Neurologists: A Cross-Sectional Study Investigating Preference, Satisfaction Ratings and Perceived Empathy in Responses among People Living with Multiple Sclerosis. J. Neurol. 2024, 271, 4057–4066. [Google Scholar] [CrossRef]
- Huang, C.; Hong, D.; Chen, X. ChatGPT in Medicine: Evaluating Psoriasis Patient Concerns. Ski. Res. Technol. 2024, 30, e13680. [Google Scholar] [CrossRef]
- Topsakal, O.; Akinci, T.C.; Celikoyar, M. Evaluating Patient and Otolaryngologist Dialogues Generated by ChatGPT, Are They Adequate? Res. Sq. 2023. [Google Scholar] [CrossRef]
- Moise, A.; Centomo-Bozzo, A.; Orishchak, O.; Alnoury, M.K.; Daniel, S.J. Can ChatGPT Replace an Otolaryngologist in Guiding Parents on Tonsillectomy? Ear Nose Throat J. 2024, 4556, 01455613241230841. [Google Scholar] [CrossRef]
- Jedrzejczak, W.W.; Skarzynski, P.H.; Raj-Koziak, D.; Sanfins, M.D.; Hatzopoulos, S.; Kochanek, K. ChatGPT for Tinnitus Information and Support: Response Accuracy and Retest after Three Months. medRxiv 2023. [Google Scholar] [CrossRef]
- Jedrzejczak, W.W.; Kochanek, K. Comparison of the Audiological Knowledge of Three Chatbots–ChatGPT, Bing Chat, and Bard. medRxiv 2023. [Google Scholar] [CrossRef]
- Kochanek, K.; Skarzynski, H.; Jedrzejczak, W.W. Accuracy and Repeatability of ChatGPT Based on a Set of Multiple-Choice Questions on Objective Tests of Hearing. Cureus 2024, 16, e59857. [Google Scholar] [CrossRef] [PubMed]
- Wang, S.; Mo, C.; Chen, Y.; Dai, X.; Wang, H.; Shen, X. Exploring the Performance of ChatGPT-4 in the Taiwan Audiologist Qualification Examination: Preliminary Observational Study Highlighting the Potential of AI Chatbots in Hearing Care. JMIR Med. Educ. 2024, 10, e55595. [Google Scholar] [CrossRef] [PubMed]
- Pastucha, M. Do chatbots provide reliable information about mobile apps in audiology? J. Hear. Sci. 2024, 14, 9–15. [Google Scholar] [CrossRef]
- Introduction|DeepL API Documentation. Available online: https://developers.deepl.com/docs/ (accessed on 16 October 2024).
- Kur, M. Method of Measuring the Effort Related to Post-Editing Machine Translated Outputs Produced in the English>Polish Language Pair by Google, Microsoft and DeepL MT Engines: A Pilot Study. Beyond Philol. Int. J. Linguist. Lit. Stud. Engl. Lang. Teach. 2019, 69–99. [Google Scholar] [CrossRef]
- Wang, R.Y.; Strong, D.M. Beyond Accuracy: What Data Quality Means to Data Consumers. J. Manag. Inf. Syst. 1996, 12, 5–33. [Google Scholar] [CrossRef]
- Kim, R.; Margolis, A.; Barile, J.; Han, K.; Kalash, S.; Papaioannou, H.; Krevskaya, A.; Milanaik, R. Challenging the Chatbot: An Assessment of ChatGPT’s Diagnoses and Recommendations for DBP Case Studies. J. Dev. Behav. Pediatr. 2024, 45, e8–e13. [Google Scholar] [CrossRef]
- He, W.; Zhang, W.; Jin, Y.; Zhou, Q.; Zhang, H.; Xia, Q. Physician Versus Large Language Model Chatbot Responses to Web-Based Questions From Autistic Patients in Chinese: Cross-Sectional Comparative Analysis. J. Med. Internet Res. 2024, 26, e54706. [Google Scholar] [CrossRef]
- Gumilar, K.E.; Indraprasta, B.R.; Faridzi, A.S.; Wibowo, B.M.; Herlambang, A.; Rahestyningtyas, E.; Irawan, B.; Tambunan, Z.; Bustomi, A.F.; Brahmantara, B.N.; et al. Assessment of Large Language Models (LLMs) in Decision-Making Support for Gynecologic Oncology. Comput. Struct. Biotechnol. J. 2024, 23, 4019–4026. [Google Scholar] [CrossRef]
- Ismaiel, N.; Nguyen, T.P.; Guo, N.; Carvalho, B.; Sultan, P.; Study Collaborators. The Evaluation of the Performance of ChatGPT in the Management of Labor Analgesia. J. Clin. Anesth. 2024, 98, 111582. [Google Scholar] [CrossRef]
- Frosolini, A.; Franz, L.; Benedetti, S.; Vaira, L.A.; de Filippis, C.; Gennaro, P.; Marioni, G.; Gabriele, G. Assessing the Accuracy of ChatGPT References in Head and Neck and ENT Disciplines. Eur. Arch. Otorhinolaryngol. 2023, 280, 5129–5133. [Google Scholar] [CrossRef]
- Sallam, M. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare 2023, 11, 887. [Google Scholar] [CrossRef] [PubMed]
- Plevris, V.; Papazafeiropoulos, G.; Jiménez Rios, A. Chatbots Put to the Test in Math and Logic Problems: A Comparison and Assessment of ChatGPT-3.5, ChatGPT-4, and Google Bard. AI 2023, 4, 949–969. [Google Scholar] [CrossRef]
- Jędrzejczak, W.W.; Pastucha, M.; Skarżyński, H.; Kochanek, K. Comparison of ChatGPT and Gemini as Sources of References in Otorhinolaryngology. medRxiv 2024. [Google Scholar] [CrossRef]
- Chelli, M.; Descamps, J.; Lavoué, V.; Trojani, C.; Azar, M.; Deckert, M.; Raynier, J.-L.; Clowez, G.; Boileau, P.; Ruetsch-Chelli, C. Hallucination Rates and Reference Accuracy of ChatGPT and Bard for Systematic Reviews: Comparative Analysis. J. Med. Internet Res. 2024, 26, e53164. [Google Scholar] [CrossRef] [PubMed]
No. | Question |
---|---|
1 | What is the CROS system? |
2 | What is conductive hearing loss? |
3 | What frequencies are responsible for the reception of speech sounds in children? |
4 | What can be the causes of otitis in a child? |
5 | How to interpret the test of tonal audiometry, what is the norm of hearing? |
6 | What is the ABR test in a child? |
7 | What is newborn hearing screening? |
8 | What is the course of exudative otitis media in a child? |
9 | How can hearing be tested in a one-year-old child? |
10 | What is the verbal audiometry * test? |
11 | What types of tympanograms are distinguished in impedance audiometry? |
12 | For what purpose is an ear impression performed in children? |
13 | How to determine air conduction in the audiogram? |
14 | What should be done in the case of an abnormal result of hearing screening in a newborn? |
15 | What do the yellow and blue certificates in the child’s health book mean in the context of hearing screening of children in Poland? |
16 | What are the contraindications to the use of air-conduction hearing aids? |
17 | What are the most effective methods of treatment of exudative otitis media in a child? |
18 | What is the result of a verbal audiometry * test and how to interpret it? |
19 | What are the results of otoscopy for otosclerosis? |
20 | What implantable hearing prostheses can be used in children? |
No. | Question |
---|---|
1a | How do you rate the usefulness of the ChatGPT tool for patients as a source of information? |
1b | How do you rate the usefulness of the ChatGPT tool for students in education? |
1c | How do you assess the usefulness of the ChatGPT tool for specialists in consulting difficult, specialized cases? |
2 | How do you assess the possibility of using the ChatGPT tool in your professional work? |
3 | How do you assess the level of risk to the patient in using ChatGPT to obtain information? |
Polish Version | English Version | Group Effect | Language Version Effect | Interaction Effect | ||||
---|---|---|---|---|---|---|---|---|
M | SD | M | SD | F; p | F; p | F; p | ||
Correctness | Student | 4.02 | 0.70 | 3.98 | 0.45 | 3.81; p = 0.059 | 0.54; p = 0.469 | 0.01; p = 0.914 |
Expert | 3.72 | 0.41 | 3.66 | 0.43 | ||||
Relevance | Student | 4.17 | 0.64 | 4.11 | 0.41 | 0.21; p = 0.648 | 3.62; p = 0.066 | 0.69; p = 0.411 |
Expert | 4.03 | 0.59 | 4.00 | 0.56 | ||||
Completeness | Student | 3.84 | 0.73 | 3.67 | 0.48 | 5.00; p = 0.032 * | 1.23; p = 0.274 | 0.02; p = 0.887 |
Expert | 3.45 | 0.45 | 3.92 | 0.55 | ||||
Linguistic accuracy | Student | 3.93 | 0.83 | 4.16 | 0.65 | 5.39; p = 0.044 * | 19.07; p < 0.001 ** | 0.53; p = 0.470 |
Expert | 3.45 | 0.45 | 3.78 | 0.51 |
Polish Version | English Version | Group Effect | Language Version Effect | Interaction Effect | ||||
---|---|---|---|---|---|---|---|---|
M | SD | M | SD | F; p | F; p; η2 | F; p; η2 | ||
Correctness | Student | 4.70 | 0.66 | 1.60 | 0.75 | 0.06; p = 0.807 | 282.56; p < 0.001 ** η2 = 0.89 | 0.08; p = 0.785 |
Expert | 4.69 | 0.60 | 1.69 | 0.79 | ||||
Relevance | Student | 4.70 | 0.66 | 1.65 | 0.81 | 3.62; p = 0.065 η2 = 0.10 | 120.77; p < 0.001 ** η2 = 0.78 | 6.87; p = 0.013 * η2 = 0.17 |
Expert | 4.56 | 0.73 | 2.69 | 1.54 | ||||
Completeness | Student | 4.20 | 1.20 | 1.45 | 0.89 | 0.91; p = 0.346 | 110.48; p < 0.001 ** η2 = 0.77 | 0.00; p > 0.999 |
Expert | 4.38 | 0.72 | 1.63 | 0.89 | ||||
Linguistic accuracy | Student | 4.30 | 1.03 | 3.80 | 1.28 | 0.08; p = 0.786 | 2.01; p = 0.165 | 0.072; p = 0.401 |
Expert | 4.19 | 0.54 | 4.06 | 1.12 |
Question No. | All Participants | Students | Experts | U; p | |||
---|---|---|---|---|---|---|---|
M | SD | M | SD | M | SD | ||
1a | 3.78 | 0.72 | 3.90 | 0.72 | 3.63 | 0.72 | 120.0; p = 0.160 |
1b | 3.25 | 0.97 | 3.55 | 0.89 | 2.88 | 0.96 | 96.0; p = 0.031 * |
1c | 2.06 | 0.83 | 2.20 | 0.89 | 1.87 | 0.72 | 127.0; p = 0.264 |
2 | 2.86 | 0.80 | 3.05 | 0.83 | 2.63 | 0.72 | 117.0; p = 0.143 |
3 | 2.94 | 0.75 | 3.00 | 0.86 | 2.88 | 0.62 | 151.0; p = 0.753 |
Question No. | Correctness | Relevance | Completeness | Linguistic Accuracy |
---|---|---|---|---|
1a | 0.30 | 0.26 | 0.30 | 0.33 * |
1b | 0.51 ** | 0.23 | 0.43 ** | 0.39 * |
1c | 0.49 ** | 0.26 | 0.38 * | 0.37 * |
2 | 0.38 * | 0.26 | 0.41 * | 0.50 ** |
3 | 0.11 | 0.03 | 0.10 | −0.07 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ratuszniak, A.; Gos, E.; Lorens, A.; Skarzynski, P.H.; Skarzynski, H.; Jedrzejczak, W.W. Performance of ChatGPT in Pediatric Audiology as Rated by Students and Experts. J. Clin. Med. 2025, 14, 875. https://doi.org/10.3390/jcm14030875
Ratuszniak A, Gos E, Lorens A, Skarzynski PH, Skarzynski H, Jedrzejczak WW. Performance of ChatGPT in Pediatric Audiology as Rated by Students and Experts. Journal of Clinical Medicine. 2025; 14(3):875. https://doi.org/10.3390/jcm14030875
Chicago/Turabian StyleRatuszniak, Anna, Elzbieta Gos, Artur Lorens, Piotr Henryk Skarzynski, Henryk Skarzynski, and W. Wiktor Jedrzejczak. 2025. "Performance of ChatGPT in Pediatric Audiology as Rated by Students and Experts" Journal of Clinical Medicine 14, no. 3: 875. https://doi.org/10.3390/jcm14030875
APA StyleRatuszniak, A., Gos, E., Lorens, A., Skarzynski, P. H., Skarzynski, H., & Jedrzejczak, W. W. (2025). Performance of ChatGPT in Pediatric Audiology as Rated by Students and Experts. Journal of Clinical Medicine, 14(3), 875. https://doi.org/10.3390/jcm14030875