Out-of-distribution (OOD) generalization is a favorable yet challenging property for deep neural networks. The core challenges lie in the limited availability of source domains that help models learn an invariant representation from the spurious features. Various domain augmentation have been proposed but largely rely on interpolating existing domains and frequently face difficulties in creating truly "novel" domains. Humans, on the other hand, can easily extrapolate novel domains, thus, an intriguing question arises: How can neural networks extrapolate like humans and achieve OOD generalization? We introduce a novel approach to domain extrapolation that leverages reasoning ability and the extensive knowledge encapsulated within large language models (LLMs) to synthesize entirely new domains. Starting with the class of interest, we query the LLMs to extract relevant knowledge for these novel domains. We then bridge the gap between the text-centric knowledge derived from LLMs and the pixel input space of the model using text-to-image generation techniques. By augmenting the training set of domain generalization datasets with high-fidelity, photo-realistic images of these new domains, we achieve significant improvements over all existing methods, as demonstrated in both single and multi-domain generalization across various benchmarks. With the ability to extrapolate any domains for any class, our method has the potential to learn a generalized model for any task without any data. To illustrate, we put forth a much more difficult setting termed, data-free domain generalization, that aims to learn a generalized model in the absence of any collected data. Our empirical findings support the above argument and our methods exhibit commendable performance in this setting, even surpassing the supervised setting by approximately 1-2\% on datasets such as VLCS.
翻译:分布外泛化是深度神经网络的一个理想但具有挑战性的特性。其核心难点在于源域数据的有限性,这些数据帮助模型从虚假特征中学习不变表征。现有多种域增强方法主要依赖于对现有域进行插值,但往往难以创建真正“新颖”的域。相比之下,人类可以轻松外推至全新领域,因此一个引人深思的问题随之产生:神经网络如何像人类一样进行外推并实现分布外泛化?我们提出了一种新颖的域外推方法,该方法利用大型语言模型的推理能力和海量知识来合成全新领域。以目标类别为起点,我们通过查询大型语言模型提取与这些新领域相关的知识。随后,利用文本到图像生成技术,弥合大型语言模型生成的以文本为中心的知识与模型像素输入空间之间的鸿沟。通过使用这些新领域的高保真、照片级真实感图像增强域泛化数据集的训练集,我们在各种基准测试的单域和多域泛化任务中均取得了显著优于现有方法的性能提升。由于能够对任意类别进行任意域的外推,我们的方法有潜力在无任何数据的情况下为任意任务学习通用模型。为阐明这一点,我们提出了一个更具挑战性的设定——无数据域泛化,旨在无任何收集数据的情况下学习通用模型。实证结果支持上述论点,我们的方法在此设定下表现出色,甚至在VLCS等数据集上超越了监督设定约1-2%。