Probabilities of Causation (PoC) play a fundamental role in decision-making in law, health care and public policy. Nevertheless, their point identification is challenging, requiring strong assumptions, in the absence of which only bounds can be derived. Existing work to further tighten these bounds by leveraging extra information either provides numerical bounds, symbolic bounds for fixed dimensionality, or requires access to multiple datasets that contain the same treatment and outcome variables. However, in many clinical, epidemiological and public policy applications, there exist external datasets that examine the effect of different treatments on the same outcome variable, or study the association between covariates and the outcome variable. These external datasets cannot be used in conjunction with the aforementioned bounds, since the former may entail different treatment assignment mechanisms, or even obey different causal structures. Here, we provide symbolic bounds on the PoC for this challenging scenario. We focus on combining either two randomized experiments studying different treatments, or a randomized experiment and an observational study, assuming causal sufficiency. Our symbolic bounds work for arbitrary dimensionality of covariates and treatment, and we discuss the conditions under which these bounds are tighter than existing bounds in literature. Finally, our bounds parameterize the difference in treatment assignment mechanism across datasets, allowing the mechanisms to vary across datasets while still allowing causal information to be transferred from the external dataset to the target dataset.
翻译:因果概率(Probabilities of Causation, PoC)在法律、医疗保健和公共政策等决策中发挥着基础性作用。然而,对其进行点识别极具挑战性,需要强假设条件,若缺乏这些假设则只能推导出边界值。现有通过利用额外信息进一步收紧这些边界的研究,要么仅提供数值边界,要么提供固定维度的符号边界,要么需要访问包含相同处理变量和结果变量的多个数据集。然而,在许多临床、流行病学和公共政策应用中,存在一些外部数据集——它们可能研究不同处理变量对同一结果变量的影响,或探究协变量与结果变量之间的关联。这些外部数据集无法与前述边界结合使用,因为它们可能涉及不同的处理分配机制,甚至遵循不同的因果结构。本文针对这一具有挑战性的场景,提供了因果概率的符号边界。我们聚焦于以下两种组合情况:(1)合并两个研究不同处理的随机实验;(2)合并一个随机实验与一个观察性研究,并假设因果充分性成立。我们的符号边界适用于任意维度的协变量和处理变量,同时讨论了这些边界相较于现有文献中的边界更为严格的条件。最后,我们的边界通过参数化跨数据集的处理分配机制差异,允许机制在不同数据集间变化,同时仍能将因果信息从外部数据集迁移至目标数据集。