Invariant causal prediction (ICP, Peters et al. (2016)) provides a novel way for identifying causal predictors of a response by utilizing heterogeneous data from different environments. One notable advantage of ICP is that it guarantees to make no false causal discoveries with high probability. Such a guarantee, however, can be overly conservative in some applications, resulting in few or no causal discoveries. This raises a natural question: Can we use less conservative error control guarantees for ICP so that more causal information can be extracted from data? We address this question in the paper. We focus on two commonly used and more liberal guarantees: false discovery rate control and simultaneous true discovery bound. Unexpectedly, we find that false discovery rate does not seem to be a suitable error criterion for ICP. The simultaneous true discovery bound, on the other hand, proves to be an ideal choice, enabling users to explore potential causal predictors and extract more causal information. Importantly, the additional information comes for free, in the sense that no extra assumptions are required and the discoveries from the original ICP approach are fully retained. We demonstrate the practical utility of our method through simulations and a real dataset about the educational attainment of teenagers in the US.
翻译:不变因果预测(ICP,Peters等人(2016))通过利用来自不同环境的异质数据,为识别响应的因果预测变量提供了一种新颖的方法。ICP的一个显著优势在于,它能以高概率保证不做出错误的因果发现。然而,这种保证在某些应用中可能过于保守,导致很少或没有因果发现。这就引出了一个自然的问题:我们能否对ICP使用不那么保守的误差控制保证,从而从数据中提取更多的因果信息?本文旨在解决这个问题。我们重点关注两种常用且更宽松的保证:错误发现率控制和同步真实发现界。出乎意料的是,我们发现错误发现率似乎并不适合作为ICP的误差准则。另一方面,同步真实发现界被证明是一个理想的选择,它使用户能够探索潜在的因果预测变量并提取更多的因果信息。重要的是,这些额外信息的获取是"免费"的,即不需要额外的假设,并且原始ICP方法的发现被完全保留。我们通过模拟和一个关于美国青少年教育程度的真实数据集,展示了我们方法的实际效用。