Deep neural networks have shown remarkable performance in image classification. However, their performance significantly deteriorates with corrupted input data. Domain generalization methods have been proposed to train robust models against out-of-distribution data. Data augmentation in the frequency domain is one of such approaches that enable a model to learn phase features to establish domain-invariant representations. This approach changes the amplitudes of the input data while preserving the phases. However, using fixed phases leads to susceptibility to phase fluctuations because amplitudes and phase fluctuations commonly occur in out-of-distribution. In this study, to address this problem, we introduce an approach using finite variation of the phases of input data rather than maintaining fixed phases. Based on the assumption that the degree of domain-invariant features varies for each phase, we propose a method to distinguish phases based on this degree. In addition, we propose a method called vital phase augmentation (VIPAug) that applies the variation to the phases differently according to the degree of domain-invariant features of given phases. The model depends more on the vital phases that contain more domain-invariant features for attaining robustness to amplitude and phase fluctuations. We present experimental evaluations of our proposed approach, which exhibited improved performance for both clean and corrupted data. VIPAug achieved SOTA performance on the benchmark CIFAR-10 and CIFAR-100 datasets, as well as near-SOTA performance on the ImageNet-100 and ImageNet datasets. Our code is available at https://github.com/excitedkid/vipaug.
翻译:深度神经网络在图像分类任务中展现出卓越性能,然而其性能在输入数据遭受污染时会显著下降。领域泛化方法旨在训练模型以应对分布外数据。频域数据增强技术通过保留相位信息改变输入数据振幅,使模型学习相位特征以建立领域不变表示。但使用固定相位会导致模型对相位波动敏感,因为分布外数据通常同时存在振幅与相位波动。针对该问题,本研究提出通过有限改变输入数据相位而非保持相位固定的方法。基于不同相位所含领域不变特征程度存在差异的假设,我们提出根据该程度区分相位的策略。进一步,我们提出关键相位增强方法(VIPAug),根据给定相位的领域不变特征程度差异化施加相位变异。该模型更依赖包含更多领域不变特征的关键相位,从而获得对振幅与相位波动的鲁棒性。实验评估表明,所提方法在干净数据与受损数据上均取得性能提升。VIPAug在CIFAR-10和CIFAR-100基准数据集上达到当前最优性能,在ImageNet-100和ImageNet数据集上接近最优水平。代码已开源:https://github.com/excitedkid/vipaug。