Inferring causal structures from time series data is the central interest of many scientific inquiries. A major barrier to such inference is the problem of subsampling, i.e., the frequency of measurement is much lower than that of causal influence. To overcome this problem, numerous methods have been proposed, yet either was limited to the linear case or failed to achieve identifiability. In this paper, we propose a constraint-based algorithm that can identify the entire causal structure from subsampled time series, without any parametric constraint. Our observation is that the challenge of subsampling arises mainly from hidden variables at the unobserved time steps. Meanwhile, every hidden variable has an observed proxy, which is essentially itself at some observable time in the future, benefiting from the temporal structure. Based on these, we can leverage the proxies to remove the bias induced by the hidden variables and hence achieve identifiability. Following this intuition, we propose a proxy-based causal discovery algorithm. Our algorithm is nonparametric and can achieve full causal identification. Theoretical advantages are reflected in synthetic and real-world experiments.
翻译:从时间序列数据中推断因果结构是许多科学研究的核心关注点。此类推断的一个主要障碍是子采样问题,即测量频率远低于因果影响频率。为解决该问题,已有多种方法被提出,然而它们要么局限于线性情形,要么无法实现可识别性。本文提出一种基于约束的算法,能够在无需任何参数约束的情况下,从子采样时间序列中识别完整因果结构。我们的观察表明,子采样的挑战主要源于未观测时间步上的隐变量。同时,每个隐变量都有一个可观测的代理变量——本质上就是该变量在未来某个可观测时间点上的自身,这得益于时间结构。基于此,我们可以利用这些代理变量来消除由隐变量引入的偏差,从而实现可识别性。遵循这一直觉,我们提出基于代理变量的因果发现算法。该算法是非参数的,能够实现完整的因果识别。理论优势在合成实验与真实世界实验中均得到了体现。