Out-of-distribution (OOD) generalization is a favorable yet challenging property for deep neural networks. The core challenges lie in the limited availability of source domains that help models learn an invariant representation from the spurious features. Various domain augmentation have been proposed but largely rely on interpolating existing domains and frequently face difficulties in creating truly "novel" domains. Humans, on the other hand, can easily extrapolate novel domains, thus, an intriguing question arises: How can neural networks extrapolate like humans and achieve OOD generalization? We introduce a novel approach to domain extrapolation that leverages reasoning ability and the extensive knowledge encapsulated within large language models (LLMs) to synthesize entirely new domains. Starting with the class of interest, we query the LLMs to extract relevant knowledge for these novel domains. We then bridge the gap between the text-centric knowledge derived from LLMs and the pixel input space of the model using text-to-image generation techniques. By augmenting the training set of domain generalization datasets with high-fidelity, photo-realistic images of these new domains, we achieve significant improvements over all existing methods, as demonstrated in both single and multi-domain generalization across various benchmarks. With the ability to extrapolate any domains for any class, our method has the potential to learn a generalized model for any task without any data. To illustrate, we put forth a much more difficult setting termed, data-free domain generalization, that aims to learn a generalized model in the absence of any collected data. Our empirical findings support the above argument and our methods exhibit commendable performance in this setting, even surpassing the supervised setting by approximately 1-2\% on datasets such as VLCS.
翻译:分布外(OOD)泛化是深度神经网络一个理想但极具挑战性的特性。其核心困难在于源域数据有限,难以帮助模型从虚假特征中学习到不变表示。现有多种域增强方法主要依赖对现有域的插值,但常难以创造真正"新颖"的域。相比之下,人类能轻松外推至全新域,因此一个有趣的问题浮现:神经网络如何像人类一样外推并实现OOD泛化?我们提出一种新颖的域外推方法,利用大型语言模型(LLMs)的推理能力及其蕴含的广泛知识来合成全新域。从目标类别出发,我们查询LLMs以提取与这些新域相关的知识。随后,通过文本到图像生成技术,弥合LLMs导出的文本中心知识与模型像素输入空间之间的鸿沟。通过向域泛化数据集的训练集添加这些新域的高保真、照片级真实图像,我们在多个基准测试的单域和跨域泛化任务中均取得了显著优于现有方法的效果。由于能对任意类别外推任意域,我们的方法有望在无数据情况下学习任意任务的泛化模型。为此,我们提出了一个更具挑战性的设置——无数据域泛化,旨在无需任何收集数据的情况下训练泛化模型。实验结果验证了上述观点,我们的方法在此设置下表现出色,甚至在VLCS等数据集上超越监督设置约1-2%。