Heterogeneous treatment effect (HTE) estimation from observational data poses significant challenges due to treatment selection bias. Existing methods address this bias by minimizing distribution discrepancies between treatment groups in latent space, focusing on global alignment. However, the fruitful aspect of local proximity, where similar units exhibit similar outcomes, is often overlooked. In this study, we propose Proximity-enhanced CounterFactual Regression (CFR-Pro) to exploit proximity for enhancing representation balancing within the HTE estimation context. Specifically, we introduce a pair-wise proximity regularizer based on optimal transport to incorporate the local proximity in discrepancy calculation. However, the curse of dimensionality renders the proximity measure and discrepancy estimation ineffective -- exacerbated by limited data availability for HTE estimation. To handle this problem, we further develop an informative subspace projector, which trades off minimal distance precision for improved sample complexity. Extensive experiments demonstrate that CFR-Pro accurately matches units across different treatment groups, effectively mitigates treatment selection bias, and significantly outperforms competitors. Code is available at https://github.com/HowardZJU/CFR-Pro.
翻译:异质性处理效应估计面临观测数据中的选择偏差挑战。现有方法通过最小化潜在空间中处理组间的分布差异来消除偏差,但局限于全局对齐策略。然而,具有相似特征单元常产生相似结果的局部邻近特性常被忽视。本研究提出邻近增强反事实回归模型,通过挖掘邻近特性增强处理效应估计中的表征平衡。具体而言,我们基于最优传输理论构建成对邻近正则化项,将局部邻近性融入差异度量计算。针对高维诅咒导致邻近度量和差异估计失效的问题,特别是处理效应估计中数据有限加剧的困境,进一步开发信息性子空间投影器,通过牺牲最小距离精度换取样本复杂度改善。大量实验表明,CFR-Pro能精准匹配不同处理组单元,有效缓解选择偏差,性能显著优于现有方法。代码开源:https://github.com/HowardZJU/CFR-Pro。