Model-based causal feature selection for general response types

Discovering causal relationships from observational data is a fundamental yet challenging task. Invariant causal prediction (ICP, Peters et al., 2016) is a method for causal feature selection which requires data from heterogeneous settings and exploits that causal models are invariant. ICP has been extended to general additive noise models and to nonparametric settings using conditional independence tests. However, the latter often suffer from low power (or poor type I error control) and additive noise models are not suitable for applications in which the response is not measured on a continuous scale, but reflects categories or counts. Here, we develop transformation-model (TRAM) based ICP, allowing for continuous, categorical, count-type, and uninformatively censored responses (these model classes, generally, do not allow for identifiability when there is no exogenous heterogeneity). As an invariance test, we propose TRAM-GCM based on the expected conditional covariance between environments and score residuals with uniform asymptotic level guarantees. For the special case of linear shift TRAMs, we also consider TRAM-Wald, which tests invariance based on the Wald statistic. We provide an open-source R package 'tramicp' and evaluate our approach on simulated data and in a case study investigating causal features of survival in critically ill patients.

翻译：从观测数据中发现因果关系是一项基础但具有挑战性的任务。不变因果预测（ICP, Peters 等, 2016）是一种因果特征选择方法，它需要来自异质环境的数据，并利用因果模型的不变性。ICP 已被推广到一般加性噪声模型，并通过条件独立性检验扩展到非参数设置。然而，后者通常存在检验功效低（或一类错误控制差）的问题，而加性噪声模型不适用于响应变量并非连续尺度测量，而是反映类别或计数的应用场景。在此，我们开发了基于变换模型（TRAM）的 ICP，允许处理连续型、类别型、计数型以及无信息删失响应变量（这些模型类别通常在没有外生异质性时无法识别）。作为不变性检验，我们提出了基于环境与得分残差之间期望条件协方差的 TRAM-GCM，并具有一致的渐近水平保证。对于线性移位 TRAM 的特殊情况，我们还考虑了 TRAM-Wald，它基于 Wald 统计量检验不变性。我们提供了开源 R 包 'tramicp'，并在模拟数据以及一项关于危重患者生存因果特征的案例研究中评估了我们的方法。

相关内容

特征选择

关注 5940

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日