During the past decade, deep neural networks have led to fast-paced progress and significant achievements in computer vision problems, for both academia and industry. Yet despite their success, state-of-the-art image classification approaches fail to generalize well in previously unseen visual contexts, as required by many real-world applications. In this paper, we focus on this domain generalization (DG) problem and argue that the generalization ability of deep convolutional neural networks can be improved by taking advantage of multi-layer and multi-scaled representations of the network. We introduce a framework that aims at improving domain generalization of image classifiers by combining both low-level and high-level features at multiple scales, enabling the network to implicitly disentangle representations in its latent space and learn domain-invariant attributes of the depicted objects. Additionally, to further facilitate robust representation learning, we propose a novel objective function, inspired by contrastive learning, which aims at constraining the extracted representations to remain invariant under distribution shifts. We demonstrate the effectiveness of our method by evaluating on the domain generalization datasets of PACS, VLCS, Office-Home and NICO. Through extensive experimentation, we show that our model is able to surpass the performance of previous DG methods and consistently produce competitive and state-of-the-art results in all datasets
翻译:过去十年间,深度神经网络推动了计算机视觉问题在学术界和工业界的快速发展与重大突破。然而尽管取得巨大成功,最先进的图像分类方法仍难以像许多实际应用所要求的那样,在未见过的视觉场景中实现良好泛化。本文聚焦于这一领域泛化问题,并提出可通过利用网络的多层与多尺度表征来提升深度卷积神经网络的泛化能力。我们引入一个旨在通过融合多尺度下的低级与高级特征来改善图像分类器领域泛化的框架,该框架使网络能够在其潜在空间中隐式解耦表征,并学习所描绘对象的域不变属性。此外,为进一步促进鲁棒表征学习,我们受对比学习启发设计了一个新颖的目标函数,旨在约束提取的表征在分布偏移下保持不变。通过在PACS、VLCS、Office-Home和NICO四个领域泛化数据集上的评估,我们验证了方法的有效性。大量实验表明,我们的模型能够超越此前领域泛化方法的性能,并在所有数据集上持续取得具有竞争力且最先进的结果。