Methods of causal discovery aim to identify causal structures in a data driven way. Existing algorithms are known to be unstable and sensitive to statistical errors, and are therefore rarely used with biomedical or epidemiological data. We present an algorithm that efficiently exploits temporal structure, so-called tiered background knowledge, for estimating causal structures. Tiered background knowledge is readily available from, e.g., cohort or registry data. When used efficiently it renders the algorithm more robust to statistical errors and ultimately increases accuracy in finite samples. We describe the algorithm and illustrate how it proceeds. Moreover, we offer formal proofs as well as examples of desirable properties of the algorithm, which we demonstrate empirically in an extensive simulation study. To illustrate its usefulness in practice, we apply the algorithm to data from a children's cohort study investigating the interplay of diet, physical activity and other lifestyle factors for health outcomes.
翻译:因果发现方法旨在以数据驱动的方式识别因果结构。现有算法已知存在不稳定性且对统计误差敏感,因此在生物医学或流行病学数据中鲜少使用。本文提出一种算法,能有效利用时序结构(即所谓的分层背景知识)来估计因果结构。分层背景知识易于从队列研究或登记数据等来源获取。当被有效利用时,该算法对统计误差更具鲁棒性,并最终提升有限样本下的准确性。我们详细描述了算法流程并阐释其执行步骤。此外,我们提供了形式化证明及算法理想特性的示例,并通过大量模拟研究进行实证验证。为说明其实际应用价值,我们将该算法应用于一项儿童队列研究数据,以探究饮食、体力活动及其他生活方式因素对健康结局的交互影响。