Overlap, also known as positivity, is a key condition for causal treatment effect estimation. Many popular estimators suffer from high variance and become brittle when features differ strongly across treatment groups. This is especially challenging in high dimensions: the curse of dimensionality can make overlap implausible. To address this, we propose a class of feature representations called deconfounding scores, which preserve both identification and the target of estimation; the classical propensity and prognostic scores are two special cases. We characterize the problem of finding a representation with better overlap as minimizing an overlap divergence under a deconfounding score constraint. We then derive closed-form expressions for a class of deconfounding scores under a broad family of generalized linear models with Gaussian features and show that prognostic scores are overlap-optimal within this class. We conduct extensive experiments to assess this behavior empirically.
翻译:重叠性(亦称积极性)是因果处理效应估计的关键条件。当不同处理组的特征差异显著时,许多常见估计量会呈现高方差并变得脆弱。这一问题在高维场景下尤为严峻:维度灾难可能导致重叠性难以成立。为解决此问题,我们提出一类称为去混杂得分的特征表征方法,它既能保证可识别性,又能保留估计目标;经典倾向得分与预后得分是其两类特例。我们将寻找具有更优重叠性表征的问题,表征为在去混杂得分约束下最小化重叠散度。随后,我们在具有高斯特征的广义线性模型族中推导出一类去混杂得分的闭式表达式,并证明预后得分在该族中具有重叠最优性。我们通过大量实验评估了该行为的经验表现。