We provide an accessible description of a peer-reviewed generalizable causal machine learning pipeline to (i) discover latent causal sources of large-scale electronic health records observations, and (ii) quantify the source causal effects on clinical outcomes. We illustrate how imperfect multimodal clinical data can be processed, decomposed into probabilistic independent latent sources, and used to train taskspecific causal models from which individual causal effects can be estimated. We summarize the findings of the two real-world applications of the approach to date as a demonstration of its versatility and utility for medical discovery at scale.
翻译:本文提供了一种经过同行评审、可推广的因果机器学习流程的简明描述,旨在(i)发现大规模电子健康记录观测数据中的潜在因果源,以及(ii)量化这些源对临床结果的因果效应。我们阐述了如何处理不完善的多模态临床数据,将其分解为概率独立的潜在源,并用于训练任务特定的因果模型,从而估计个体因果效应。我们总结了该方法迄今为止在两个真实世界应用中的发现,以证明其在规模化医学发现中的通用性和实用性。