Statistical inference of directed relations given some unspecified interventions (i.e., the intervention targets are unknown) is challenging. In this article, we test hypothesized directed relations with unspecified interventions. First, we derive conditions to yield an identifiable model. Unlike classical inference, testing directed relations requires identifying the ancestors and relevant interventions of hypothesis-specific primary variables. To this end, we propose a peeling algorithm based on nodewise regressions to establish a topological order of primary variables. Moreover, we prove that the peeling algorithm yields a consistent estimator in low-order polynomial time. Second, we propose a likelihood ratio test integrated with a data perturbation scheme to account for the uncertainty of identifying ancestors and interventions. Also, we show that the distribution of a data perturbation test statistic converges to the target distribution. Numerical examples demonstrate the utility and effectiveness of the proposed methods, including an application to infer gene regulatory networks. The R implementation is available at https://github.com/chunlinli/intdag.
翻译:在部分干预未指定(即干预目标未知)的情况下,对定向关系进行统计推断具有挑战性。本文研究了在干预未指定条件下对假设的定向关系进行检验的方法。首先,我们推导了实现模型可识别性的条件。与经典推断不同,检验定向关系需要识别假设相关主要变量的祖先及相应干预。为此,我们提出了一种基于节点回归的剥离算法,以建立主要变量的拓扑顺序。此外,我们证明该算法能在低阶多项式时间内得到一致估计量。其次,我们提出了一种结合数据扰动方案的似然比检验,以考虑祖先与干预识别的不确定性。同时,我们证明了数据扰动检验统计量的分布收敛于目标分布。数值实验验证了所提方法的实用性及有效性,包括在基因调控网络推断中的应用。R语言实现代码见https://github.com/chunlinli/intdag。