Subset verification and search algorithms for causal DAGs

Learning causal relationships between variables is a fundamental task in causal inference and directed acyclic graphs (DAGs) are a popular choice to represent the causal relationships. As one can recover a causal graph only up to its Markov equivalence class from observations, interventions are often used for the recovery task. Interventions are costly in general and it is important to design algorithms that minimize the number of interventions performed. In this work, we study the problem of identifying the smallest set of interventions required to learn the causal relationships between a subset of edges (target edges). Under the assumptions of faithfulness, causal sufficiency, and ideal interventions, we study this problem in two settings: when the underlying ground truth causal graph is known (subset verification) and when it is unknown (subset search). For the subset verification problem, we provide an efficient algorithm to compute a minimum sized interventional set; we further extend these results to bounded size non-atomic interventions and node-dependent interventional costs. For the subset search problem, in the worst case, we show that no algorithm (even with adaptivity or randomization) can achieve an approximation ratio that is asymptotically better than the vertex cover of the target edges when compared with the subset verification number. This result is surprising as there exists a logarithmic approximation algorithm for the search problem when we wish to recover the whole causal graph. To obtain our results, we prove several interesting structural properties of interventional causal graphs that we believe have applications beyond the subset verification/search problems studied here.

翻译：学习变量之间的因果关系是因果推断中的基本任务，而有向无环图（DAG）是表示因果关系的常用选择。由于仅通过观测数据只能将因果图恢复至其马尔可夫等价类，因此通常需要使用干预来完成恢复任务。干预手段通常代价高昂，设计能最小化干预次数的算法至关重要。本文研究如何识别学习一组边（目标边）之间因果关系所需的最小干预集问题。在忠实性、因果充分性和理想干预的假设下，我们在两种场景下研究该问题：当基础真实因果图已知时（子集验证）和未知时（子集搜索）。针对子集验证问题，我们提出了一种高效算法来计算最小规模的干预集；我们进一步将这些结果推广到有界大小的非原子干预和节点相关干预成本的情形。针对子集搜索问题，在最坏情况下，我们证明：与子集验证数相比，没有任何算法（即使具有自适应或随机化能力）能实现渐近优于目标边顶点覆盖的近似比。这一结果令人意外，因为当需要恢复整个因果图时，存在一种对数近似算法来解决搜索问题。为获得这些结果，我们证明了干预因果图的若干有趣结构性质，这些性质我们认为不仅适用于本文研究的子集验证/搜索问题，还具有更广泛的应用前景。