Family history is considered a risk factor for many diseases because it implicitly captures shared genetic, environmental and lifestyle factors. Finland's nationwide electronic health record (EHR) system spanning multiple generations presents new opportunities for studying a connected network of medical histories for entire families. In this work we present a graph-based deep learning approach for learning explainable, supervised representations of how each family member's longitudinal medical history influences a patient's disease risk. We demonstrate that this approach is beneficial for predicting 10-year disease onset for 5 complex disease phenotypes, compared to clinically-inspired and deep learning baselines for Finland's nationwide EHR system comprising 7 million individuals with up to third-degree relatives. Through the use of graph explainability techniques, we illustrate that a graph-based approach enables more personalized modeling of family information and disease risk by identifying important relatives and features for prediction.
翻译:家族史被认为是多种疾病的危险因素,因为它隐含地捕捉了共享的遗传、环境和生活方式因素。芬兰覆盖多代人的全国电子健康记录系统为研究整个家族医疗史的关联网络提供了新的机遇。本文提出了一种基于图的深度学习方法,可学习可解释的监督表征,用以描述每位家庭成员的纵向医疗史如何影响患者的疾病风险。我们证明,与基于临床启发和深度学习的基线方法相比,该方法在预测芬兰全国电子健康记录系统(涵盖700万包含至三级亲属的个体)中5种复杂疾病表型的10年发病风险方面更具优势。通过利用图可解释性技术,我们阐明了基于图的方法能够通过识别对预测至关重要的亲属和特征,实现家族信息与疾病风险的更个性化建模。