This study introduces a novel approach for early Type 2 Diabetes Mellitus (T2DM) risk prediction using a tabular transformer (TabTrans) architecture to analyze longitudinal patient data. By processing patients` longitudinal health records and bone-related tabular data, our model captures complex, long-range dependencies in disease progression that conventional methods often overlook. We validated our TabTrans model on a retrospective Qatar BioBank (QBB) cohort of 1,382 subjects, comprising 725 men (146 diabetic, 579 healthy) and 657 women (133 diabetic, 524 healthy). The study integrated electronic health records (EHR) with dual-energy X-ray absorptiometry (DXA) data. To address class imbalance, we employed SMOTE and SMOTE-ENN resampling techniques. The proposed model`s performance is evaluated against conventional machine learning (ML) and generative AI models, including Claude 3.5 Sonnet (Anthropic`s constitutional AI), GPT-4 (OpenAI`s generative pre-trained transformer), and Gemini Pro (Google`s multimodal language model). Our TabTrans model demonstrated superior predictive performance, achieving ROC AUC $\geq$ 79.7 % for T2DM prediction compared to both generative AI models and conventional ML approaches. Feature interpretation analysis identified key risk indicators, with visceral adipose tissue (VAT) mass and volume, ward bone mineral density (BMD) and bone mineral content (BMC), T and Z-scores, and L1-L4 scores emerging as the most important predictors associated with diabetes development in Qatari adults. These findings demonstrate the significant potential of TabTrans for analyzing complex tabular healthcare data, providing a powerful tool for proactive T2DM management and personalized clinical interventions in the Qatari population. Index Terms: tabular transformers, multimodal data, DXA data, diabetes, T2DM, feature interpretation, tabular data
翻译:本研究提出了一种新颖的早期2型糖尿病(T2DM)风险预测方法,采用表格Transformer(TabTrans)架构分析纵向患者数据。通过处理患者的纵向健康记录与骨骼相关表格数据,我们的模型能够捕捉传统方法常忽略的疾病进展中复杂的长程依赖关系。我们在包含1,382名受试者的卡塔尔生物银行(QBB)回顾性队列中验证了TabTrans模型,该队列包括725名男性(146名糖尿病患者,579名健康者)和657名女性(133名糖尿病患者,524名健康者)。研究整合了电子健康记录(EHR)与双能X射线吸收测定法(DXA)数据。为处理类别不平衡问题,我们采用了SMOTE和SMOTE-ENN重采样技术。所提出模型的性能通过与传统机器学习(ML)模型及生成式AI模型(包括Claude 3.5 Sonnet、GPT-4和Gemini Pro)的对比进行评估。我们的TabTrans模型展现出卓越的预测性能,在T2DM预测中实现了ROC AUC $\geq$ 79.7%,优于生成式AI模型和传统ML方法。特征解释分析确定了关键风险指标:内脏脂肪组织(VAT)质量与体积、沃德骨密度(BMD)与骨矿物质含量(BMC)、T值与Z值以及L1-L4评分,这些是卡塔尔成年人群糖尿病发展的最重要预测因子。这些发现证明了TabTrans在分析复杂表格医疗数据方面的巨大潜力,为卡塔尔人群的主动型T2DM管理和个性化临床干预提供了有力工具。