Necessary and sufficient graphical conditions for optimal adjustment sets in causal graphical models with hidden variables

from arxiv, 41 pages, published as spotlight paper in 35th Conference on Neural Information Processing Systems (NeurIPS 2021); this version has an updated Supplementary Material with corrected proofs (also updated in NeurIPS proceedings)

The problem of selecting optimal backdoor adjustment sets to estimate causal effects in graphical models with hidden and conditioned variables is addressed. Previous work has defined optimality as achieving the smallest asymptotic estimation variance and derived an optimal set for the case without hidden variables. For the case with hidden variables there can be settings where no optimal set exists and currently only a sufficient graphical optimality criterion of limited applicability has been derived. In the present work optimality is characterized as maximizing a certain adjustment information which allows to derive a necessary and sufficient graphical criterion for the existence of an optimal adjustment set and a definition and algorithm to construct it. Further, the optimal set is valid if and only if a valid adjustment set exists and has higher (or equal) adjustment information than the Adjust-set proposed in Perkovi{\'c} et al. [Journal of Machine Learning Research, 18: 1--62, 2018] for any graph. The results translate to minimal asymptotic estimation variance for a class of estimators whose asymptotic variance follows a certain information-theoretic relation. Numerical experiments indicate that the asymptotic results also hold for relatively small sample sizes and that the optimal adjustment set or minimized variants thereof often yield better variance also beyond that estimator class. Surprisingly, among the randomly created setups more than 90\% fulfill the optimality conditions indicating that also in many real-world scenarios graphical optimality may hold. Code is available as part of the python package \url{https://github.com/jakobrunge/tigramite}.

翻译：本文研究了在含隐藏变量和条件变量的图模型中，选择最优后门调整集以估计因果效应的问题。先前工作将最优性定义为实现最小渐近估计方差，并推导了无隐藏变量情况下的最优集。对于含隐藏变量的情形，可能存在无最优集的情况，目前仅推导出适用性有限的一个充分图最优性准则。在本工作中，最优性被刻画为最大化某种调整信息量，从而能够推导出最优调整集存在的必要与充分图准则，以及构建该集合的定义与算法。进一步，该最优集有效当且仅当存在有效调整集，且其调整信息量高于（或等于）Perković等人[Journal of Machine Learning Research, 18: 1–62, 2018]针对任意图提出的Adjust-set。上述结果可转化为最小渐近估计方差，适用于渐近方差遵循特定信息论关系的一类估计器。数值实验表明，渐近结论在样本量较小时仍成立，且最优调整集或其最小化变体通常在这类估计器之外也能获得更优的方差。令人惊讶的是，在随机生成的设定中，超过90%满足最优性条件，这表明在许多真实场景中图最优性也可能成立。代码作为Python包的一部分提供，详见\url{https://github.com/jakobrunge/tigramite}。