Discovering causal relationships from observational data is a fundamental yet challenging task. Invariant causal prediction (ICP, Peters et al., 2016) is a method for causal feature selection which requires data from heterogeneous settings and exploits that causal models are invariant. ICP has been extended to general additive noise models and to nonparametric settings using conditional independence tests. However, the latter often suffer from low power (or poor type I error control) and additive noise models are not suitable for applications in which the response is not measured on a continuous scale, but reflects categories or counts. Here, we develop transformation-model (TRAM) based ICP, allowing for continuous, categorical, count-type, and uninformatively censored responses (these model classes, generally, do not allow for identifiability when there is no exogenous heterogeneity). As an invariance test, we propose TRAM-GCM based on the expected conditional covariance between environments and score residuals with uniform asymptotic level guarantees. For the special case of linear shift TRAMs, we also consider TRAM-Wald, which tests invariance based on the Wald statistic. We provide an open-source R package 'tramicp' and evaluate our approach on simulated data and in a case study investigating causal features of survival in critically ill patients.
翻译:从观测数据中发现因果关系是一项基础且具有挑战性的任务。不变因果预测(ICP, Peters et al., 2016)是一种因果特征选择方法,它需要来自异质环境的数据,并利用因果模型的不变性。ICP已被推广至一般加性噪声模型,并通过条件独立性检验扩展至非参数设定。然而,后者常面临检验效能不足(或第一类错误控制不佳)的问题,且加性噪声模型不适用于响应变量非连续尺度测量(如反映类别或计数)的应用场景。本文开发了基于转换模型(TRAM)的ICP方法,能够处理连续型、分类型、计数型以及无信息删失的响应变量(若无外生异质性,这些模型类通常不具备可识别性)。作为不变性检验,我们提出了TRAM-GCM,该方法基于环境与得分残差之间的期望条件协方差,并具有一致的渐近水平保证。针对线性平移TRAM这一特例,我们还提出了基于Wald统计量检验不变性的TRAM-Wald方法。我们提供了开源R软件包‘tramicp’,并在模拟数据及一项探究危重患者生存期因果特征的案例研究中评估了所提方法。