Discovering causal relations from observational data is important. The existence of unobserved variables, such as latent confounders or mediators, can mislead the causal identification. To address this issue, proximal causal discovery methods proposed to adjust for the bias with the proxy of the unobserved variable. However, these methods presumed the data is discrete, which limits their real-world application. In this paper, we propose a proximal causal discovery method that can well handle the continuous variables. Our observation is that discretizing continuous variables can can lead to serious errors and comprise the power of the proxy. Therefore, to use proxy variables in the continuous case, the critical point is to control the discretization error. To this end, we identify mild regularity conditions on the conditional distributions, enabling us to control the discretization error to an infinitesimal level, as long as the proxy is discretized with sufficiently fine, finite bins. Based on this, we design a proxy-based hypothesis test for identifying causal relationships when unobserved variables are present. Our test is consistent, meaning it has ideal power when large samples are available. We demonstrate the effectiveness of our method using synthetic and real-world data.
翻译:从观测数据中发现因果关系具有重要意义。未观测变量(如潜在混杂变量或中介变量)的存在可能导致因果识别产生误导。为解决此问题,近端因果发现方法提出通过未观测变量的代理变量来调整偏差。然而,这些方法假设数据为离散型,限制了其实际应用。本文提出一种能有效处理连续变量的近端因果发现方法。我们的观察表明:对连续变量进行离散化可能导致严重误差,从而削弱代理变量的效力。因此,在连续情形下使用代理变量的关键在于控制离散化误差。为此,我们识别出条件分布上的温和正则条件,使得只要以足够精细的有限网格进行离散化,就能将离散化误差控制在极微小水平。基于此,我们设计了一种基于代理变量的假设检验方法,用于识别存在未观测变量时的因果关系。该检验具有一致性,即在样本量足够大时具备理想检验效能。我们通过合成数据与真实数据验证了该方法的有效性。