Subset verification and search algorithms for causal DAGs

Learning causal relationships between variables is a fundamental task in causal inference and directed acyclic graphs (DAGs) are a popular choice to represent the causal relationships. As one can recover a causal graph only up to its Markov equivalence class from observations, interventions are often used for the recovery task. Interventions are costly in general and it is important to design algorithms that minimize the number of interventions performed. In this work, we study the problem of identifying the smallest set of interventions required to learn the causal relationships between a subset of edges (target edges). Under the assumptions of faithfulness, causal sufficiency, and ideal interventions, we study this problem in two settings: when the underlying ground truth causal graph is known (subset verification) and when it is unknown (subset search). For the subset verification problem, we provide an efficient algorithm to compute a minimum sized interventional set; we further extend these results to bounded size non-atomic interventions and node-dependent interventional costs. For the subset search problem, in the worst case, we show that no algorithm (even with adaptivity or randomization) can achieve an approximation ratio that is asymptotically better than the vertex cover of the target edges when compared with the subset verification number. This result is surprising as there exists a logarithmic approximation algorithm for the search problem when we wish to recover the whole causal graph. To obtain our results, we prove several interesting structural properties of interventional causal graphs that we believe have applications beyond the subset verification/search problems studied here.

翻译：学习变量间的因果关系是因果推断中的基本任务，而有向无环图（DAG）是表示因果关系的常用工具。由于仅通过观测数据只能将因果图恢复至其马尔可夫等价类，因此通常需要引入干预来完成恢复任务。干预通常代价高昂，设计能最小化干预次数的算法至关重要。本文研究如何确定学习目标边子集（目标边）因果关系所需的最少干预集。在忠实性、因果充分性及理想干预假设下，我们分别研究了两种场景：基础真实因果图已知时的子集验证问题，以及因果图未知时的子集搜索问题。针对子集验证问题，我们提出了一种高效算法以计算最小干预集，并将结果扩展至有界非原子干预与节点依赖干预成本场景。在子集搜索问题中，我们证明了最坏情况下，任何算法（即使采用自适应或随机化策略）相对于子集验证数的近似比在渐近意义上均无法优于目标边的顶点覆盖。这一结论令人惊讶，因为当意图恢复完整因果图时，搜索问题存在对数近似算法。为获得上述结果，我们证明了干预因果图的若干有趣结构性质，这些性质的应用价值可能超越本文研究的子集验证/搜索问题本身。