BMC Med Inform Decis Mak. 2026 May 15. doi: 10.1186/s12911-026-03547-5. Online ahead of print.
ABSTRACT
BACKGROUND: Sleep apnea syndrome (SAS) is closely related to an increased risk of non-alcoholic fatty liver disease (NAFLD), but current clinical tools lack the integration of multidimensional data for accurate risk prediction. This study employs various machine learning algorithms to develop and validate a risk prediction model for the occurrence of NAFLD in SAS patients.
METHODS: This retrospective multi-center study used data from 595 SAS patients diagnosed in 2024 at Wuxi Fifth People's Hospital (training cohort) and 372 patients from Wuxi Second People's Hospital (external validation cohort). Demographic, anthropometric, biochemical, and comorbidity data were collected. Multivariate logistic regression was used to explore the risk factors for NAFLD in SAS patients, and restricted cubic spline (RCS) non-linear trend analysis was performed on the risk factors. After preprocessing and variable selection using Boruta and LASSO regression, nine machine learning models were trained. The best-performing model was selected based on the area under the receiver operating characteristic curve (AUC) and underwent external validation.
RESULTS: Multifactorial logistic regression analysis found that sex, obesity, hyperlipidemia, low-density lipoprotein cholesterol (LDL-C), AST to ALT ratio, uric acid, γ-glutamyltransferase (GGT), albumin, and PLT were significantly associated with the occurrence of NAFLD in SAS patients. The aspartate aminotransferase to alanine aminotransferase ratio (AST to ALT ratio), GGT, albumin, and platelet count (PLT) exhibited a non-linear relationship with the occurrence of NAFLD in SAS patients. Among the 9 machine learning models, logistic regression was the best model (AUC = 0.752, 95% CI: 0.654-0.850). Key predictive features included LDL-C, hyperlipidemia, high-density lipoprotein cholesterol (HDL-C), AST/ALT, and GGT. SHAP analysis showed that elevated LDL-C, hyperlipidemia, and GGT increased the risk of NAFLD, while higher HDL-C levels had a protective effect. The AST/ALT ratio was negatively correlated with risk.
CONCLUSION: This study found that sex, obesity, hyperlipidemia, LDL-C, AST to ALT ratio, uric acid, GGT, albumin, and PLT were significantly associated with the occurrence of NAFLD in SAS patients. The logistic regression model established in this study effectively predicts the risk of NAFLD in SAS patients, emphasizing the core role of lipid metabolism abnormalities and markers of oxidative stress. This tool provides a practical and easily interpretable solution for early risk stratification and targeted intervention in clinical practice.
CLINICAL TRIAL NUMBER: Not applicable.
PMID:42141448 | DOI:10.1186/s12911-026-03547-5

