Cien Saude Colet. 2026 May;31(5):e23492025. doi: 10.1590/1413-81232026315.23492025. Epub 2025 Dec 12.
ABSTRACT
This study aimed to evaluate the predictive performance of machine learning algorithms in estimating PCATool, leveraging two datasets spanning 10 years. Participants were adults (≥18y) and children (≤12y) surveyed in 2014 and 2024 in Rio de Janeiro. The outcomes were the PCATool i) general and ii) essential scores (Low [≤6.6] and High [>6.6]). Data from 2014 was divided (80:20) into: i) training and ii) test set. Data from 2024 was used for external validation. A total of 4,417 adults and 4,072 children were included. Random Forest Classifier (RFC) demonstrated the best overall performance for the general score in adults, achieving an AUC=0.68 (95%CI [0.64-0.72]) in the test set and 0.66 (95%CI [0.62-0.69]) in the validation set. For the essential score in adults, RFC also outperformed other models, with an AUC=0.68 (95%CI [0.63-0.72]) in the test set and 0.65 (95%CI [0.62-0.69]) in the validation set. In children, Extreme Gradient Boosting exhibited the best overall performance for the general score, achieving an AUC=0.65 (95%CI [0.60-0.69]) in the test set and 0.63 (95%CI [0.59-0.66]) in the validation set. Shapley value analysis identified continuity of care with the same doctor as the most influential predictor across models. These findings highlight the potential of machine learning to predict PCATool scores.
PMID:42385075 | DOI:10.1590/1413-81232026315.23492025

