Under the prevalent potential outcome model in causal inference, each unit is associated with multiple potential outcomes but at most one of which is observed, leading to many causal quantities being only partially identified. The inherent missing data issue echoes the multi-marginal optimal transport (MOT) problem, where marginal distributions are known, but how the marginals couple to form the joint distribution is unavailable. In this paper, we cast the causal partial identification problem in the framework of MOT with $K$ margins and $d$-dimensional outcomes and obtain the exact partial identified set. In order to estimate the partial identified set via MOT, statistically, we establish a convergence rate of the plug-in MOT estimator for general quadratic objective functions and prove it is minimax optimal for a quadratic objective function stemming from the variance minimization problem with arbitrary $K$ and $d \le 4$. Numerically, we demonstrate the efficacy of our method over several real-world datasets where our proposal consistently outperforms the baseline by a significant margin (over 70%). In addition, we provide efficient off-the-shelf implementations of MOT with general objective functions.
翻译:在因果推断中普遍采用潜在结果模型下,每个单元关联着多个潜在结果,但至多仅能观测到其中一个,这导致许多因果量仅能被部分识别。这一固有的缺失数据问题与多边际最优输运(MOT)问题相呼应——在MOT中边际分布已知,但边际如何耦合形成联合分布却是未知的。本文将具有$K$个边际和$d$维结果的因果部分识别问题置于MOT框架中,并获得了精确的部分识别集。为了通过MOT估计部分识别集,在统计上,我们为一般二次目标函数建立了插件式MOT估计量的收敛速率,并证明该速率对于源自方差最小化问题(具有任意$K$且$d \le 4$)的二次目标函数是最小极大最优的。在数值实验方面,我们在多个真实数据集上验证了方法的有效性,所提方法始终以显著优势(超过70%)超越基线。此外,我们为具有一般目标函数的MOT提供了高效的开箱即用实现。