As a promising individualized treatment effect (ITE) estimation method, counterfactual regression (CFR) maps individuals' covariates to a latent space and predicts their counterfactual outcomes. However, the selection bias between control and treatment groups often imbalances the two groups' latent distributions and negatively impacts this method's performance. In this study, we revisit counterfactual regression through the lens of information bottleneck and propose a novel learning paradigm called Gromov-Wasserstein information bottleneck (GWIB). In this paradigm, we learn CFR by maximizing the mutual information between covariates' latent representations and outcomes while penalizing the kernelized mutual information between the latent representations and the covariates. We demonstrate that the upper bound of the penalty term can be implemented as a new regularizer consisting of $i)$ the fused Gromov-Wasserstein distance between the latent representations of different groups and $ii)$ the gap between the transport cost generated by the model and the cross-group Gromov-Wasserstein distance between the latent representations and the covariates. GWIB effectively learns the CFR model through alternating optimization, suppressing selection bias while avoiding trivial latent distributions. Experiments on ITE estimation tasks show that GWIB consistently outperforms state-of-the-art CFR methods. To promote the research community, we release our project at https://github.com/peteryang1031/Causal-GWIB.
翻译:作为个体化处理效应估计的一种有前景的方法,反事实回归将个体协变量映射到隐空间并预测其反事实结果。然而,对照组与处理组之间的选择偏差常导致两组隐分布失衡,进而影响该方法的性能。本研究通过信息瓶颈的视角重新审视反事实回归,提出一种称为Gromov-Wasserstein信息瓶颈的新型学习范式。在该范式中,我们通过最大化协变量隐表示与结果之间的互信息,同时惩罚隐表示与协变量之间的核化互信息来学习反事实回归模型。我们证明该惩罚项的上界可实现为一个由两部分构成的新正则项:$i)$ 不同组隐表示之间的融合Gromov-Wasserstein距离,以及$ii)$ 模型生成的传输成本与隐表示和协变量之间的跨组Gromov-Wasserstein距离之间的差距。GWIB通过交替优化有效学习反事实回归模型,在抑制选择偏差的同时避免平凡的隐分布。在个体化处理效应估计任务上的实验表明,GWIB持续优于当前最先进的反事实回归方法。为促进研究社区发展,我们在https://github.com/peteryang1031/Causal-GWIB公开了项目代码。