Cien Saude Colet. 2026 May;31(5):e01132026. doi: 10.1590/1413-81232026315.01132026. Epub 2026 Jan 28.
ABSTRACT
Obesity represents a growing challenge for Primary Health Care (PHC), demanding the use of new technologies and innovative solutions. This study evaluated the adherence of Large Language Models (LLMs) to Brazilian guidelines for obesity management in PHC. An in silico experimental study was conducted in two phases. Initially, 62 LLMs underwent latency screening, with 24 meeting the usability criterion (<10s) and proceeding to content evaluation. Clinical adherence was measured by 16 questions extracted verbatim from the Clinical Protocol (PCDT), covering diagnosis, monitoring, and treatment, and was peer-reviewed (Kappa 0.68-1.00). The exclusion of 38 models (61%) due to latency highlighted infrastructure barriers. Among the 24 viable models, adherence was heterogeneous and limited, with the best performance reaching only 61.1% compliance. The correlation between model size and accuracy was moderate, but multimodality proved to be a predictor of better performance. It is concluded that current LLMs do not offer safety for autonomous operation in PHC, requiring rigorous professional curation, continuous validation, and escalation mechanisms to human care to ensure user safety.
PMID:42385060 | DOI:10.1590/1413-81232026315.01132026