To date, various neural methods have been proposed for causal effect estimation based on observational data, where a default assumption is the same distribution and availability of variables at both training and inference (i.e., runtime) stages. However, distribution shift (i.e., domain shift) could happen during runtime, and bigger challenges arise from the impaired accessibility of variables. This is commonly caused by increasing privacy and ethical concerns, which can make arbitrary variables unavailable in the entire runtime data and imputation impractical. We term the co-occurrence of domain shift and inaccessible variables runtime domain corruption, which seriously impairs the generalizability of a trained counterfactual predictor. To counter runtime domain corruption, we subsume counterfactual prediction under the notion of domain adaptation. Specifically, we upper-bound the error w.r.t. the target domain (i.e., runtime covariates) by the sum of source domain error and inter-domain distribution distance. In addition, we build an adversarially unified variational causal effect model, named VEGAN, with a novel two-stage adversarial domain adaptation scheme to reduce the latent distribution disparity between treated and control groups first, and between training and runtime variables afterwards. We demonstrate that VEGAN outperforms other state-of-the-art baselines on individual-level treatment effect estimation in the presence of runtime domain corruption on benchmark datasets.
翻译:迄今为止,基于观测数据已提出多种用于因果效应估计的神经方法,其默认假设是训练阶段与推理阶段(即运行时)变量具有相同的分布和可用性。然而,运行时可能发生分布偏移(即领域偏移),而变量可用性受损会带来更大挑战。这通常由日益增长的隐私和伦理关注导致,使得任意变量在整个运行时数据中不可用,且插补方法不可行。我们将领域偏移与变量不可及的共现现象称为"运行时域损坏",它会严重损害训练后的反事实预测器的泛化能力。为应对运行时域损坏,我们将反事实预测纳入领域适应框架。具体而言,我们将目标域(即运行时协变量)的误差上界设置为源域误差与域间分布距离之和。此外,我们构建了一个名为VEGAN的对抗式统一变分因果效应模型,其采用新颖的两阶段对抗域适应方案:首先减少处理组与控制组之间的潜在分布差异,再缩小训练变量与运行时变量之间的分布差距。实验表明,在基准数据集存在运行时域损坏的情况下,VEGAN在个体处理效应估计任务中优于其他最先进基线模型。