Dysglycemia, encompassing both prediabetes and diabetes, affects huge numbers of adults worldwide, yet many of them remain undiagnosed. We developed and validated machine-learning (ML) models for non-invasive screening of dysglycemia risk that require no laboratory tests. Pooling data from the National Health and Nutrition Examination Survey (NHANES) 2017--2023 (n=14,352), we trained six ML models with stratified 5-fold cross-validation and compared them with two established clinical risk scores. LightGBM achieved the highest area under the receiver operating characteristic curve (AUC=0.820, 95% CI: 0.806--0.835), outperforming the Finnish Diabetes Risk Score (0.745) and American Diabetes Association Risk Test (0.783). SHAP analysis identified age, race/ethnicity, and waist-to-height ratio as the most influential predictors. Subgroup analyses confirmed consistent performance across demographic strata (AUC: 0.735--0.832). These results demonstrate the feasibility of explainable, laboratory-free dysglycemia screening for deployment in community settings and self-tracking health applications.
翻译:血糖异常(包括糖尿病前期和糖尿病)影响着全球大量成年人,然而其中许多人仍未得到诊断。我们开发并验证了无需实验室检查即可进行非侵入性血糖异常风险筛查的机器学习(ML)模型。通过整合2017至2023年国家健康与营养调查(NHANES)的数据(样本量n=14,352),我们训练了六个采取分层五折交叉验证的机器学习模型,并将其与两种既定的临床风险评分进行比较。LightGBM模型取得了最高的受试者工作特征曲线下面积(AUC=0.820,95%CI:0.806–0.835),优于芬兰糖尿病风险评分(0.745)和美国糖尿病协会风险测试(0.783)。基于SHAP的分析表明,年龄、种族/民族和腰围身高比是最有影响力的预测因子。亚组分析证实了该模型在不同人口统计分层中表现一致(AUC:0.735–0.832)。这些结果证明了可解释的、免实验室血糖异常筛查方法在社区环境和自我追踪健康应用中部署的可行性。