We study a fundamental question of domain generalization: given a family of domains (i.e., data distributions), how many randomly sampled domains do we need to collect data from in order to learn a model that performs reasonably well on every seen and unseen domain in the family? We model this problem in the PAC framework and introduce a new combinatorial measure, which we call the domain shattering dimension. We show that this dimension characterizes the domain sample complexity. Furthermore, we establish a tight quantitative relationship between the domain shattering dimension and the classic VC dimension, demonstrating that every hypothesis class that is learnable in the standard PAC setting is also learnable in our setting.
翻译:我们研究域泛化的一个基本问题:给定一族域(即数据分布),需要从多少个随机采样的域中收集数据,才能学习一个在族内所有已知和未知域上表现合理的模型?我们在PAC框架下对该问题进行建模,并引入一种新的组合度量——我们称之为域破碎维度。我们证明该维度刻画了域的样本复杂度。此外,我们建立了域破碎维度与经典VC维度之间的紧致定量关系,表明每一个在标准PAC设定下可学习的假设类,在我们的设定下同样可学习。