During the past decade, deep neural networks have led to fast-paced progress and significant achievements in computer vision problems, for both academia and industry. Yet despite their success, state-of-the-art image classification approaches fail to generalize well in previously unseen visual contexts, as required by many real-world applications. In this paper, we focus on this domain generalization (DG) problem and argue that the generalization ability of deep convolutional neural networks can be improved by taking advantage of multi-layer and multi-scaled representations of the network. We introduce a framework that aims at improving domain generalization of image classifiers by combining both low-level and high-level features at multiple scales, enabling the network to implicitly disentangle representations in its latent space and learn domain-invariant attributes of the depicted objects. Additionally, to further facilitate robust representation learning, we propose a novel objective function, inspired by contrastive learning, which aims at constraining the extracted representations to remain invariant under distribution shifts. We demonstrate the effectiveness of our method by evaluating on the domain generalization datasets of PACS, VLCS, Office-Home and NICO. Through extensive experimentation, we show that our model is able to surpass the performance of previous DG methods and consistently produce competitive and state-of-the-art results in all datasets.
翻译:过去十年来,深度神经网络在计算机视觉问题中(无论是学术界还是工业界)取得了快速进展与重大突破。然而,尽管这些技术取得了成功,最先进的图像分类方法在面对许多实际应用所需的未见视觉场景时,仍难以实现良好的泛化能力。本文聚焦于这一领域泛化问题,并论证利用深度卷积神经网络的多层与多尺度表征能够提升其泛化能力。我们提出一个框架,旨在通过结合多尺度下的低层与高层特征,使网络能够隐式解耦其潜在空间中的表征,并学习所描绘对象的领域不变属性,从而提升图像分类器的领域泛化能力。此外,为进一步促进鲁棒表征学习,我们受对比学习启发提出一种新型目标函数,该函数旨在约束提取的表征在分布偏移下保持不变。通过在PACS、VLCS、Office-Home及NICO等领域泛化数据集上的评估,我们验证了该方法的效果。大量实验表明,我们的模型能够超越先前领域泛化方法的性能,并在所有数据集上持续取得具有竞争力的最先进结果。