A common assumption in causal inference from observational data is that there is no hidden confounding. Yet it is, in general, impossible to verify this assumption from a single dataset. Under the assumption of independent causal mechanisms underlying the data-generating process, we demonstrate a way to detect unobserved confounders when having multiple observational datasets coming from different environments. We present a theory for testable conditional independencies that are only absent when there is hidden confounding and examine cases where we violate its assumptions: degenerate & dependent mechanisms, and faithfulness violations. Additionally, we propose a procedure to test these independencies and study its empirical finite-sample behavior using simulation studies and semi-synthetic data based on a real-world dataset. In most cases, the proposed procedure correctly predicts the presence of hidden confounding, particularly when the confounding bias is large.
翻译:在基于观测数据进行因果推断时,一个常见假设是不存在隐式混杂。然而,一般而言,无法通过单一数据集验证这一假设。在数据生成过程遵循独立因果机制的假设下,我们提出了一种方法,当存在多个来自不同环境的观测数据集时,能够检测未观测到的混杂变量。我们提出了一种理论,用于描述仅在存在隐式混杂时才会消失的可检验条件独立性,并检验了违反其假设的情况:退化与依赖机制以及忠实性违背。此外,我们提出了一种检验这些独立性的程序,并通过仿真研究以及基于真实数据集的半合成数据,研究了其在有限样本下的经验表现。在大多数情况下,所提出的程序能够正确预测隐式混杂的存在,尤其是在混杂偏差较大时。