Accurate estimates of causal effects play a key role in decision-making across applications such as healthcare, economics, and operations. In the absence of randomized experiments, a common approach to estimating causal effects uses \textit{covariate adjustment}. In this paper, we study covariate adjustment for discrete distributions from the PAC learning perspective, assuming knowledge of a valid adjustment set $\bZ$, which might be high-dimensional. Our first main result PAC-bounds the estimation error of covariate adjustment by a term that is exponential in the size of the adjustment set; it is known that such a dependency is unavoidable even if one only aims to minimize the mean squared error. Motivated by this result, we introduce the notion of an \emph{$\eps$-Markov blanket}, give bounds on the misspecification error of using such a set for covariate adjustment, and provide an algorithm for $\eps$-Markov blanket discovery; our second main result upper bounds the sample complexity of this algorithm. Furthermore, we provide a misspecification error bound and a constraint-based algorithm that allow us to go beyond $\eps$-Markov blankets to even smaller adjustment sets. Our third main result upper bounds the sample complexity of this algorithm, and our final result combines the first three into an overall PAC bound. Altogether, our results highlight that one does not need to perfectly recover causal structure in order to ensure accurate estimates of causal effects.
翻译:在医疗保健、经济学和运营等应用领域的决策过程中,因果效应的准确估计起着关键作用。在缺乏随机实验的情况下,估计因果效应的一种常用方法是利用\textit{协变量调整}。本文从PAC学习的角度研究离散分布的协变量调整,假设已知一个可能为高维的有效调整集$\bZ$。我们的第一个主要结果通过一个与调整集大小呈指数关系的项来PAC界定协变量调整的估计误差;已知即使仅旨在最小化均方误差,这种依赖性也是不可避免的。受此结果启发,我们引入了\emph{$\eps$-马尔可夫毯}的概念,给出了使用此类集合进行协变量调整的设定误差界限,并提供了一种$\eps$-马尔可夫毯发现算法;我们的第二个主要结果给出了该算法样本复杂度的上界。此外,我们提供了一个设定误差界限和一个基于约束的算法,使我们能够超越$\eps$-马尔可夫毯,找到更小的调整集。我们的第三个主要结果给出了该算法样本复杂度的上界,而最终结果则将前三个结果结合成一个整体的PAC界限。总之,我们的研究结果强调,为了确保因果效应的准确估计,并不需要完美地恢复因果结构。