Making predictions in an unseen environment given data from multiple training environments is a challenging task. We approach this problem from an invariance perspective, focusing on binary classification to shed light on general nonlinear data generation mechanisms. We identify a unique form of invariance that exists solely in a binary setting that allows us to train models invariant over environments. We provide sufficient conditions for such invariance and show it is robust even when environmental conditions vary greatly. Our formulation admits a causal interpretation, allowing us to compare it with various frameworks. Finally, we propose a heuristic prediction method and conduct experiments using real and synthetic datasets.
翻译:在给定多个训练环境数据的情况下,对未见环境进行预测是一项具有挑战性的任务。我们从不变性视角出发处理该问题,聚焦于二分类任务以阐明一般非线性数据生成机制。我们识别出一种仅存在于二分类场景中的独特不变性形式,该形式使我们能够训练出跨环境不变的模型。本文给出了此类不变性的充分条件,并证明即使在环境条件剧烈变化时该性质仍具有鲁棒性。我们的框架具有因果解释性,可与多种方法体系进行比较。最后,我们提出一种启发式预测方法,并在真实数据集与合成数据集上开展实验验证。