Domain generalization (DG) is about training models that generalize well under domain shift. Previous research on DG has been conducted mostly in single-source or multi-source settings. In this paper, we consider a third, lesser-known setting where a training domain is endowed with a collection of pairs of examples that share the same semantic information. Such semantic sharing (SS) pairs can be created via data augmentation and then utilized for consistency regularization (CR). We present a theory showing CR is conducive to DG and propose a novel CR method called Logit Attribution Matching (LAM). We conduct experiments on five DG benchmarks and four pretrained models with SS pairs created by both generic and targeted data augmentation methods. LAM outperforms representative single/multi-source DG methods and various CR methods that leverage SS pairs. The code and data of this project are available at https://github.com/Gaohan123/LAM
翻译:领域泛化(Domain Generalization,DG)旨在训练能够在领域偏移下良好泛化的模型。先前关于DG的研究主要在单源或多源设置下进行。本文考虑第三种较少被探索的设置:训练域被赋予一组共享相同语义信息的样本对。此类语义共享(Semantic Sharing,SS)对可通过数据增强方法创建,并用于一致性正则化(Consistency Regularization,CR)。我们提出了证明CR有助于DG的理论,并提出一种名为Logit Attribution Matching(LAM)的新型CR方法。我们在五个DG基准测试和四个预训练模型上进行了实验,使用的SS对通过通用及定向数据增强方法生成。LAM在性能上超越了具有代表性的单源/多源DG方法以及多种利用SS对的CR方法。本项目的代码与数据已公开于https://github.com/Gaohan123/LAM。