Next Article in Journal
Risk Factors for Postoperative Acute Kidney Injury Requiring Renal Replacement Therapy in Patients Undergoing Heart Valve Surgery
Previous Article in Journal
A Feasibility Study of a Controlled Standing Fulcrum Side-Bending Test in Adolescent Idiopathic Scoliosis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Comparison of the Accuracy, Completeness, Reproducibility, and Consistency of Different AI Chatbots in Providing Nutritional Advice: An Exploratory Study

1
Department of Medical Science, University of Turin, 10126 Torino, Italy
2
Department of Psychology, University of Turin, 10124 Torino, Italy
3
Dietetic and Clinical Nutrition Unit, Città della Salute e della Scienza Hospital, 10126 Torino, Italy
4
Department of Food Science and Technology, University of Gastronomic Sciences, 12042 Pollenzo, Italy
5
Department of Translatioal Medicine, University of Ferrara, 44121 Ferrara, Italy
*
Author to whom correspondence should be addressed.
J. Clin. Med. 2024, 13(24), 7810; https://doi.org/10.3390/jcm13247810
Submission received: 19 November 2024 / Revised: 11 December 2024 / Accepted: 17 December 2024 / Published: 20 December 2024
(This article belongs to the Section Endocrinology & Metabolism)

Abstract

Background: The use of artificial intelligence (AI) chatbots for obtaining healthcare advice is greatly increased in the general population. This study assessed the performance of general-purpose AI chatbots in giving nutritional advice for patients with obesity with or without multiple comorbidities. Methods: The case of a 35-year-old male with obesity without comorbidities (Case 1), and the case of a 65-year-old female with obesity, type 2 diabetes mellitus, sarcopenia, and chronic kidney disease (Case 2) were submitted to 10 different AI chatbots on three consecutive days. Accuracy (the ability to provide advice aligned with guidelines), completeness, and reproducibility (replicability of the information over the three days) of the chatbots’ responses were evaluated by three registered dietitians. Nutritional consistency was evaluated by comparing the nutrient content provided by the chatbots with values calculated by dietitians. Results: Case 1: ChatGPT 3.5 demonstrated the highest accuracy rate (67.2%) and Copilot the lowest (21.1%). ChatGPT 3.5 and ChatGPT 4.0 achieved the highest completeness (both 87.3%), whereas Gemini and Copilot recorded the lowest scores (55.6%, 42.9%, respectively). Reproducibility was highest for Chatsonic (86.1%) and lowest for ChatGPT 4.0 (50%) and ChatGPT 3.5 (52.8%). Case 2: Overall accuracy was low, with no chatbot achieving 50% accuracy. Completeness was highest for ChatGPT 4.0 and Claude (both 77.8%), and lowest for Copilot (23.3%). ChatGPT 4.0 and Pi Ai showed the lowest reproducibility. Major inconsistencies regarded the amount of protein recommended by most chatbots, which suggested simultaneously to both reduce and increase protein intake. Conclusions: General-purpose AI chatbots exhibited limited accuracy, reproducibility, and consistency in giving dietary advice in complex clinical scenarios and cannot replace the work of an expert dietitian.
Keywords: obesity; chatbots; artificial intelligence; dietary plans; dietary advice obesity; chatbots; artificial intelligence; dietary plans; dietary advice

Share and Cite

MDPI and ACS Style

Ponzo, V.; Rosato, R.; Scigliano, M.C.; Onida, M.; Cossai, S.; De Vecchi, M.; Devecchi, A.; Goitre, I.; Favaro, E.; Merlo, F.D.; et al. Comparison of the Accuracy, Completeness, Reproducibility, and Consistency of Different AI Chatbots in Providing Nutritional Advice: An Exploratory Study. J. Clin. Med. 2024, 13, 7810. https://doi.org/10.3390/jcm13247810

AMA Style

Ponzo V, Rosato R, Scigliano MC, Onida M, Cossai S, De Vecchi M, Devecchi A, Goitre I, Favaro E, Merlo FD, et al. Comparison of the Accuracy, Completeness, Reproducibility, and Consistency of Different AI Chatbots in Providing Nutritional Advice: An Exploratory Study. Journal of Clinical Medicine. 2024; 13(24):7810. https://doi.org/10.3390/jcm13247810

Chicago/Turabian Style

Ponzo, Valentina, Rosalba Rosato, Maria Carmine Scigliano, Martina Onida, Simona Cossai, Morena De Vecchi, Andrea Devecchi, Ilaria Goitre, Enrica Favaro, Fabio Dario Merlo, and et al. 2024. "Comparison of the Accuracy, Completeness, Reproducibility, and Consistency of Different AI Chatbots in Providing Nutritional Advice: An Exploratory Study" Journal of Clinical Medicine 13, no. 24: 7810. https://doi.org/10.3390/jcm13247810

APA Style

Ponzo, V., Rosato, R., Scigliano, M. C., Onida, M., Cossai, S., De Vecchi, M., Devecchi, A., Goitre, I., Favaro, E., Merlo, F. D., Sergi, D., & Bo, S. (2024). Comparison of the Accuracy, Completeness, Reproducibility, and Consistency of Different AI Chatbots in Providing Nutritional Advice: An Exploratory Study. Journal of Clinical Medicine, 13(24), 7810. https://doi.org/10.3390/jcm13247810

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop