Estimating causal effects is vital for decision making. In standard causal effect estimation, treatments are usually binary- or continuous-valued. However, in many important real-world settings, treatments can be structured, high-dimensional objects, such as text, video, or audio. This provides a challenge to traditional causal effect estimation. While leveraging the shared structure across different treatments can help generalize to unseen treatments at test time, we show in this paper that using such structure blindly can lead to biased causal effect estimation. We address this challenge by devising a novel contrastive approach to learn a representation of the high-dimensional treatments, and prove that it identifies underlying causal factors and discards non-causally relevant factors. We prove that this treatment representation leads to unbiased estimates of the causal effect, and empirically validate and benchmark our results on synthetic and real-world datasets.
翻译:因果效应估计对于决策制定至关重要。在标准的因果效应估计中,干预通常是二值或连续值。然而,在许多重要的现实场景中,干预可以是结构化、高维的对象,例如文本、视频或音频。这对传统的因果效应估计提出了挑战。虽然利用不同干预之间的共享结构有助于在测试时泛化到未见过的干预,但我们在本文中指出,盲目使用这种结构可能导致有偏的因果效应估计。我们通过设计一种新颖的对比方法来学习高维干预的表征,以应对这一挑战,并证明该方法能够识别潜在的因果因子并丢弃非因果相关因子。我们证明,这种干预表征能够产生无偏的因果效应估计,并在合成数据集和真实世界数据集上对我们的结果进行了实证验证和基准测试。