The problem of domain generalization concerns learning predictive models that are robust to distribution shifts when deployed in new, previously unseen environments. Existing methods typically require labeled data from multiple training environments, limiting their applicability when labeled data are scarce. In this work, we study domain generalization in an anti-causal setting, where the outcome causes the observed covariates. Under this structure, environment perturbations that affect the covariates do not propagate to the outcome, which motivates regularizing the model's sensitivity to these perturbations. Crucially, estimating these perturbation directions does not require labels, enabling us to leverage unlabeled data from multiple environments. We propose two methods that penalize the model's sensitivity to variations in the mean and covariance of the covariates across environments, respectively, and prove that these methods have worst-case optimality guarantees under certain classes of environments. Finally, we demonstrate the empirical performance of our approach on a controlled physical system and a physiological signal dataset.
翻译:域泛化问题关注于学习在部署到新的、先前未见环境中时对分布偏移具有鲁棒性的预测模型。现有方法通常需要来自多个训练环境的标记数据,这在标记数据稀缺时限制了其适用性。在本工作中,我们研究反因果设定下的域泛化问题,其中结果变量导致观测到的协变量。在此结构下,影响协变量的环境扰动不会传播到结果变量,这促使我们对模型对这些扰动的敏感性进行正则化。关键的是,估计这些扰动方向不需要标签,使我们能够利用来自多个环境的未标记数据。我们提出了两种方法,分别惩罚模型对跨环境中协变量均值和协方差变化的敏感性,并证明这些方法在特定环境类别下具有最坏情况最优性保证。最后,我们在一个受控物理系统和一个生理信号数据集上展示了我们方法的实证性能。