A common assumption in causal inference from observational data is that there is no hidden confounding. Yet it is, in general, impossible to verify this assumption from a single dataset. Under the assumption of independent causal mechanisms underlying the data-generating process, we demonstrate a way to detect unobserved confounders when having multiple observational datasets coming from different environments. We present a theory for testable conditional independencies that are only absent when there is hidden confounding and examine cases where we violate its assumptions: degenerate & dependent mechanisms, and faithfulness violations. Additionally, we propose a procedure to test these independencies and study its empirical finite-sample behavior using simulation studies and semi-synthetic data based on a real-world dataset. In most cases, the proposed procedure correctly predicts the presence of hidden confounding, particularly when the confounding bias is large.
翻译:在基于观测数据的因果推断中,一个常见假设是不存在隐藏混杂。然而,通常无法从单个数据集中验证这一假设。在数据生成过程中独立因果机制的假设下,我们提出了一种在拥有来自不同环境的多个观测数据集时检测未观测混杂变量的方法。我们提出了一套理论,用于解释那些仅在存在隐藏混杂时才缺失的可检验条件独立性,并探讨了其假设被违反的情况:退化且依赖的机制以及忠实性违背。此外,我们提出了一种检验这些独立性的方法,并利用模拟研究以及基于真实数据集的半合成数据,研究了该方法在经验上的有限样本行为。在大多数情况下,所提出的方法能够正确预测隐藏混杂的存在,尤其是当混杂偏差较大时。