Deep neural networks have shown remarkable performance in image classification. However, their performance significantly deteriorates with corrupted input data. Domain generalization methods have been proposed to train robust models against out-of-distribution data. Data augmentation in the frequency domain is one of such approaches that enable a model to learn phase features to establish domain-invariant representations. This approach changes the amplitudes of the input data while preserving the phases. However, using fixed phases leads to susceptibility to phase fluctuations because amplitudes and phase fluctuations commonly occur in out-of-distribution. In this study, to address this problem, we introduce an approach using finite variation of the phases of input data rather than maintaining fixed phases. Based on the assumption that the degree of domain-invariant features varies for each phase, we propose a method to distinguish phases based on this degree. In addition, we propose a method called vital phase augmentation (VIPAug) that applies the variation to the phases differently according to the degree of domain-invariant features of given phases. The model depends more on the vital phases that contain more domain-invariant features for attaining robustness to amplitude and phase fluctuations. We present experimental evaluations of our proposed approach, which exhibited improved performance for both clean and corrupted data. VIPAug achieved SOTA performance on the benchmark CIFAR-10 and CIFAR-100 datasets, as well as near-SOTA performance on the ImageNet-100 and ImageNet datasets. Our code is available at https://github.com/excitedkid/vipaug.
翻译:深度神经网络在图像分类中展现出卓越性能,然而面对输入数据损坏时其性能显著下降。域泛化方法被提出用于训练对分布外数据具有鲁棒性的模型。频域数据增强是其中一种方法,它能使模型学习相位特征以建立域不变表示。该方法在保持相位不变的同时改变输入数据的振幅。但使用固定相位会导致模型对相位波动敏感,因为在分布外数据中振幅与相位波动通常同时存在。本研究针对该问题,提出采用输入数据相位的有限变化而非维持固定相位的方法。基于域不变特征程度随相位变化的假设,我们提出区分相位特征的方案。进一步,我们提出一种名为关键相位增强(VIPAug)的方法,该方法根据给定相位的域不变特征程度差异化地施加相位变化。模型将更依赖包含更多域不变特征的关键相位,从而获得对振幅与相位波动的鲁棒性。实验评估表明,所提方法在干净数据与损坏数据上均实现性能提升。VIPAug在基准数据集CIFAR-10和CIFAR-100上达到最优性能,在ImageNet-100和ImageNet数据集上取得接近最优的性能。代码已开源:https://github.com/excitedkid/vipaug