We provide an accessible description of a peer-reviewed generalizable causal machine learning pipeline to (i) discover latent causal sources of large-scale electronic health records observations, and (ii) quantify the source causal effects on clinical outcomes. We illustrate how imperfect multimodal clinical data can be processed, decomposed into probabilistic independent latent sources, and used to train taskspecific causal models from which individual causal effects can be estimated. We summarize the findings of the two real-world applications of the approach to date as a demonstration of its versatility and utility for medical discovery at scale.
翻译:本文提供了一种经过同行评审的通用因果机器学习流程的易懂描述,该流程旨在:(i) 发现大规模电子健康记录观测数据中的潜在因果源,以及(ii) 量化这些因果源对临床结局的影响。我们阐述了如何处理不完美的多模态临床数据,将其分解为概率独立的潜在源,并用于训练特定任务的因果模型,从而估计个体因果效应。我们总结了迄今为止该方法在两个真实世界应用中的发现,以展示其在大规模医学发现中的通用性和实用性。