Models trained on one set of domains often suffer performance drops on unseen domains, e.g., when wildlife monitoring models are deployed in new camera locations. In this work, we study principles for designing data augmentations for out-of-domain (OOD) generalization. In particular, we focus on real-world scenarios in which some domain-dependent features are robust, i.e., some features that vary across domains are predictive OOD. For example, in the wildlife monitoring application above, image backgrounds vary across camera locations but indicate habitat type, which helps predict the species of photographed animals. Motivated by theoretical analysis on a linear setting, we propose targeted augmentations, which selectively randomize spurious domain-dependent features while preserving robust ones. We prove that targeted augmentations improve OOD performance, allowing models to generalize better with fewer domains. In contrast, existing approaches such as generic augmentations, which fail to randomize domain-dependent features, and domain-invariant augmentations, which randomize all domain-dependent features, both perform poorly OOD. In experiments on three real-world datasets, we show that targeted augmentations set new states-of-the-art for OOD performance by 3.2-15.2%.
翻译:模型在某一组域上训练后,在未见过的域上常出现性能下降,例如,当野生动物监测模型部署到新的摄像头位置时。本研究探讨了设计用于域外泛化的数据增强原则。特别地,我们聚焦于现实场景中部分域依赖特征具有鲁棒性的情况——即不同域间变化的部分特征对域外预测有效。以野生动植物监测应用为例,图像背景虽随摄像头位置变化,却能指示栖息地类型,这有助于预测所拍摄动物的物种。在线性设定下理论分析的启发下,我们提出目标增强方法,选择性随机化虚假的域依赖特征,同时保留鲁棒特征。理论证明,目标增强能提升域外性能,使模型在更少域上实现更好的泛化。相比之下,现有方法如未能随机化域依赖特征的通用增强,以及随机化所有域依赖特征的域不变增强,两者在域外场景中表现均不佳。在三个真实数据集的实验中,我们展示目标增强以3.2%-15.2%的提升创造了域外性能的新最优水平。