Family history is considered a risk factor for many diseases because it implicitly captures shared genetic, environmental and lifestyle factors. A nationwide electronic health record (EHR) system spanning multiple generations presents new opportunities for studying a connected network of medical histories for entire families. In this work we present a graph-based deep learning approach for learning explainable, supervised representations of how each family member's longitudinal medical history influences a patient's disease risk. We demonstrate that this approach is beneficial for predicting 10-year disease onset for 5 complex disease phenotypes, compared to clinically-inspired and deep learning baselines for a nationwide EHR system comprising 7 million individuals with up to third-degree relatives. Through the use of graph explainability techniques, we illustrate that a graph-based approach enables more personalized modeling of family information and disease risk by identifying important relatives and features for prediction.
翻译:家族史被视为多种疾病的危险因素,因为它隐含地捕捉了共同的遗传、环境和生活方式因素。一个覆盖多代人群的全国性电子健康记录(EHR)系统为研究整个家庭的医疗史连接网络提供了新机遇。本文提出了一种基于图的深度学习方法,用于学习可解释的监督表示,以揭示每个家庭成员纵向医疗史如何影响患者的疾病风险。我们证明了,与基于临床启发和深度学习的基线模型相比,该方法在预测全国EHR系统(包含700万个体及最高至三级亲属关系)中五种复杂疾病表型的十年发病风险方面具有优势。通过使用图可解释性技术,我们阐明了基于图的方法通过识别对预测重要的亲属及其特征,能够实现更个性化的家庭信息与疾病风险建模。