Statistical prediction models are often trained on data that is drawn from different probability distributions than their eventual use cases. One approach to proactively prepare for these shifts harnesses the intuition that causal mechanisms should remain invariant between environments. Here we focus on a challenging setting in which the causal and anticausal variables of the target are unobserved. Leaning on information theory, we develop feature selection and engineering techniques for the observed downstream variables that act as proxies. We identify proxies that help to build stable models and moreover utilize auxiliary training tasks to extract stability-enhancing information from proxies. We demonstrate the effectiveness of our techniques on synthetic and real data.
翻译:统计预测模型通常基于与最终应用场景概率分布不同的数据进行训练。一种主动应对这些偏移的方法利用了因果机制在不同环境下保持不变的直觉。本文聚焦于目标变量的因果变量与反因果变量均不可观测的具有挑战性的场景。基于信息论,我们针对充当代理的可观测下游变量,发展了特征选择与工程技术。我们识别出有助于构建稳定模型的代理,并进一步利用辅助训练任务从代理中提取增强稳定性的信息。我们在合成数据与真实数据上验证了所提技术的有效性。