We consider the task of identifying the causal parents of a target variable among a set of candidate variables from observational data. Our main assumption is that the candidate variables are observed in different environments which may, for example, correspond to different settings of a machine or different time intervals in a dynamical process. Under certain assumptions different environments can be regarded as interventions on the observed system. We assume a linear relationship between target and covariates, which can be different in each environment with the only restriction that the causal structure is invariant across environments. This is an extension of the ICP ($\textbf{I}$nvariant $\textbf{C}$ausal $\textbf{P}$rediction) principle by Peters et al. [2016], who assumed a fixed linear relationship across all environments. Within our proposed setting we provide sufficient conditions for identifiability of the causal parents and introduce a practical method called LoLICaP ($\textbf{Lo}$cally $\textbf{L}$inear $\textbf{I}$nvariant $\textbf{Ca}$usal $\textbf{P}$rediction), which is based on a hypothesis test for parent identification using a ratio of minimum and maximum statistics. We then show in a simplified setting that the statistical power of LoLICaP converges exponentially fast in the sample size, and finally we analyze the behavior of LoLICaP experimentally in more general settings.
翻译:我们考虑从观测数据中,在一组候选变量中识别目标变量的因果父节点(causal parents)的任务。我们的主要假设是候选变量在不同的环境中被观测到,这些环境例如可能对应机器的不同设置或动态过程中的不同时间间隔。在特定假设下,不同环境可被视为对观测系统的干预。我们假设目标变量与协变量之间存在线性关系,该关系在每一环境中可能不同,唯一限制是因果结构在不同环境间保持不变。这是对Peters等人[2016]提出的ICP($\textbf{I}$nvariant $\textbf{C}$ausal $\textbf{P}$rediction)原理的扩展,后者假设所有环境中存在固定的线性关系。在我们提出的框架内,我们提供了因果父节点可识别性的充分条件,并介绍了一种名为LoLICaP($\textbf{Lo}$cally $\textbf{L}$inear $\textbf{I}$nvariant $\textbf{Ca}$usal $\textbf{P}$rediction)的实用方法。该方法基于使用最小与最大统计量之比进行父节点识别的假设检验。随后,我们在简化设定中证明LoLICaP的统计功效随样本量呈指数级快速收敛;最后,我们在更一般的设定下通过实验分析LoLICaP的行为。