J Hum Nutr Diet. 2026 Feb;39(1):e70211. doi: 10.1111/jhn.70211.
ABSTRACT
BACKGROUND: Artificial intelligence (AI) chatbots are increasingly used in various fields, including nutrition. Despite their growing popularity, there is limited evidence regarding the accuracy and reliability of clinical nutrition education potential. This study aimed to assess the performance of basic versions of AI chatbots in answering multiple-choice questions related to clinical nutrition.
METHODS: The four AI chatbots (ChatGPT, DeepSeek, Gemini and AI Clinical Nutritionist) were evaluated based on the accuracy of their responses to multiple-choice questions. Two hundred and seventy-six multiple-choice questions were developed using Krause's Food & the Nutrition Care Process textbook, covering 16 adult disease topics related to pathophysiology, aetiology, medical treatment and nutritional treatment. The test-retest reliability of responses given to the same set of multiple-choice questions at two different time points was evaluated using Cohen's Kappa (κ), and the reliability of the four AI tools was evaluated using the intraclass correlation coefficient (ICC).
RESULTS: The ICC value was 0.901 (0.880-0.918) and κ value was 0.766 and above for each AI tools (p < 0.001). In the first round, the accuracy rates were 72.8% for ChatGPT, 73.9% for DeepSeek, 69.6% for Gemini and 73.2% for AI Clinical Nutrition. In the second round, these rates were 71.4%, 72.5%, 69.3% and 70.3%, respectively. The highest overall accuracy was observed with DeepSeek (73.18%), while Gemini showed the lowest performance (69.3%). When analysed by topic, the highest accuracy was found in 'Diabetes mellitus and non-diabetic hypoglycemia' and 'Upper GI Diseases' (100%), while the lowest was in 'Anaemia' (36.4%). Among the fields, the lowest accuracy was observed in medical treatment (63.6%), followed by nutritional treatment (67.4%). The mean accuracy of AI chatbots in responding to clinical nutrition multiple-choice questions was 71.6%, with similar performances across platforms.
CONCLUSION: While the notable accuracy in certain topics and high reliability demonstrate the potential of AI chatbots for clinical nutrition, it highlights the need for continuous updates and performance optimization in topics where accuracy is lower.
PMID:41631933 | DOI:10.1111/jhn.70211