Background: Alzheimer's disease and related dementias (ADRD) ranks as the sixth leading cause of death in the US, underlining the importance of accurate ADRD risk prediction. While recent advancement in ADRD risk prediction have primarily relied on imaging analysis, yet not all patients undergo medical imaging before an ADRD diagnosis. Merging machine learning with claims data can reveal additional risk factors and uncover interconnections among diverse medical codes. Objective: Our goal is to utilize Graph Neural Networks (GNNs) with claims data for ADRD risk prediction. Addressing the lack of human-interpretable reasons behind these predictions, we introduce an innovative method to evaluate relationship importance and its influence on ADRD risk prediction, ensuring comprehensive interpretation. Methods: We employed Variationally Regularized Encoder-decoder Graph Neural Network (VGNN) for estimating ADRD likelihood. We created three scenarios to assess the model's efficiency, using Random Forest and Light Gradient Boost Machine as baselines. We further used our relation importance method to clarify the key relationships for ADRD risk prediction. Results: VGNN surpassed other baseline models by 10% in the area under the receiver operating characteristic. The integration of the GNN model and relation importance interpretation could potentially play an essential role in providing valuable insight into factors that may contribute to or delay ADRD progression. Conclusions: Employing a GNN approach with claims data enhances ADRD risk prediction and provides insights into the impact of interconnected medical code relationships. This methodology not only enables ADRD risk modeling but also shows potential for other image analysis predictions using claims data.
翻译:背景:阿尔茨海默病及相关痴呆症(ADRD)在美国位列第六大致死原因,凸显了准确预测ADRD风险的重要性。尽管近期ADRD风险预测研究主要依赖于影像分析,但并非所有患者在确诊前都接受医学影像检查。将机器学习与医疗索赔数据相结合,能够揭示额外的风险因素并发现不同医疗编码间的内在关联。目标:本研究旨在利用图神经网络(GNN)结合医疗索赔数据进行ADRD风险预测。针对现有预测方法缺乏可解释性的问题,我们提出一种创新性关系重要性评估方法,以阐明关键关系对ADRD风险预测的影响机制,确保预测结果的全面可解释性。方法:我们采用变分正则化编码器-解码器图神经网络(VGNN)来估计ADRD发病概率。通过构建三种实验场景评估模型效能,并以随机森林和轻量梯度提升机作为基线模型。进一步应用我们提出的关系重要性分析方法,明确ADRD风险预测中的关键关联关系。结果:VGNN在受试者工作特征曲线下面积指标上较其他基线模型提升10%。GNN模型与关系重要性解释方法的结合,能够为揭示促进或延缓ADRD进展的关键因素提供重要见解。结论:采用GNN方法处理医疗索赔数据不仅提升了ADRD风险预测性能,同时能够解析医疗编码关联网络的影响机制。该方法不仅适用于ADRD风险建模,也为基于索赔数据的其他医学预测任务提供了潜在解决方案。