Predicting the effect of interventions with many possible variations, e.g., therapeutic content that affects mental health outcomes or an earnings call transcript that drives movement in share price, is useful across several domains. However, classical causal estimators tend to assume that all possible interventions are observed, which is infeasible when interventions vary widely, for instance, in the space of all text strings. We adapt a well-known approach of recasting causal inference as a learning problem, to address high-dimensional treatment spaces. Specifically, under standard assumptions like no unobserved confounding, we show that causal error decomposes into a series of moment-balancing errors of increasing order, and design objectives that directly improve causal estimation. We also show how to project the effect of a high-dimensional treatment onto lower-dimensional treatment attributes, which allows a single model to answer several causal questions without additional attribute-specific training. We empirically evaluate our estimators in settings with high-dimensional continuous, discrete, and text treatments, the last of which used a semi-synthetic dataset of Amazon Reviews. Our experiments demonstrate the benefit of higher-order balance error optimization and competitive performance of projected causal estimates with attribute-specific estimators.
翻译:预测具有多种可能变体的干预效果(例如影响心理健康状态的治疗性内容,或驱动股价变动的财报电话会议记录)在多个领域具有实用价值。然而,经典因果估计方法通常假设所有可能的干预均可观测,当干预类型广泛多样(如所有文本字符串空间中的干预)时该假设难以成立。我们将因果推断重构为学习问题的经典方法进行改进,以应对高维处理空间。具体而言,在无未观测混杂等标准假设下,我们证明因果误差可分解为一系列递增阶矩平衡误差,并设计直接优化因果估计的目标函数。此外,我们展示了如何将高维处理效应投影至低维处理属性,使单一模型无需针对特定属性的额外训练即可回答多个因果问题。我们在高维连续、离散及文本处理场景下对估计器进行实证评估(最后一项使用半合成的亚马逊评论数据集)。实验结果表明,高阶平衡误差优化具有显著优势,且投影因果估计在多属性场景中的表现可与属性特定估计器相媲美。