Discovering causal relations from observational data is important. The existence of unobserved variables (e.g. latent confounding or mediation) can mislead the causal identification. To overcome this problem, proximal causal discovery methods attempted to adjust for the bias via the proxy of the unobserved variable. Particularly, hypothesis test-based methods proposed to identify the causal edge by testing the induced violation of linearity. However, these methods only apply to discrete data with strict level constraints, which limits their practice in the real world. In this paper, we fix this problem by extending the proximal hypothesis test to cases where the system consists of continuous variables. Our strategy is to present regularity conditions on the conditional distributions of the observed variables given the hidden factor, such that if we discretize its observed proxy with sufficiently fine, finite bins, the involved discretization error can be effectively controlled. Based on this, we can convert the problem of testing continuous causal relations to that of testing discrete causal relations in each bin, which can be effectively solved with existing methods. These non-parametric regularities we present are mild and can be satisfied by a wide range of structural causal models. Using both simulated and real-world data, we show the effectiveness of our method in recovering causal relations when unobserved variables exist.
翻译:从观测数据中发现因果关系具有重要意义。隐变量(例如潜在混杂或中介变量)的存在可能导致因果识别产生偏差。为解决该问题,近端因果发现方法尝试通过隐变量的代理变量来校正偏差。特别地,基于假设检验的方法提出通过检验线性关系的诱发违背来识别因果边。然而,这些方法仅适用于具有严格层级约束的离散数据,这限制了其在现实世界中的应用。本文通过将近端假设检验扩展至系统包含连续变量的情形,解决了这一问题。我们的策略是对给定隐因子条件下观测变量的条件分布提出正则性条件,使得若采用足够精细的有限区间离散化其观测代理变量,可有效控制涉及的离散化误差。基于此,可将连续因果关系的检验问题转化为各区间内离散因果关系的检验问题,并利用现有方法高效求解。我们提出的非参数正则性条件较为宽松,可被广泛的结构因果模型满足。通过合成数据与真实数据实验,我们验证了本方法在存在隐变量时恢复因果关系的有效性。