Frequency analysis is useful for understanding the mechanisms of representation learning in neural networks (NNs). Most research in this area focuses on the learning dynamics of NNs for regression tasks, while little for classification. This study empirically investigates the latter and expands the understanding of frequency shortcuts. First, we perform experiments on synthetic datasets, designed to have a bias in different frequency bands. Our results demonstrate that NNs tend to find simple solutions for classification, and what they learn first during training depends on the most distinctive frequency characteristics, which can be either low- or high-frequencies. Second, we confirm this phenomenon on natural images. We propose a metric to measure class-wise frequency characteristics and a method to identify frequency shortcuts. The results show that frequency shortcuts can be texture-based or shape-based, depending on what best simplifies the objective. Third, we validate the transferability of frequency shortcuts on out-of-distribution (OOD) test sets. Our results suggest that frequency shortcuts can be transferred across datasets and cannot be fully avoided by larger model capacity and data augmentation. We recommend that future research should focus on effective training schemes mitigating frequency shortcut learning.
翻译:频率分析有助于理解神经网络(NN)表征学习机制。该领域大多数研究聚焦于回归任务中NN的学习动态,而对分类任务关注甚少。本文通过实证研究填补这一空白,并拓展了对频率捷径的理解。首先,我们在设计具有不同频段偏差的合成数据集上进行实验。结果表明,NN倾向于为分类任务寻找简单解,且训练过程中最先学习的内容取决于最具区分性的频率特征——这些特征可能是低频或高频。其次,我们在自然图像上验证了这一现象。我们提出了一种度量类别间频率特征差异的指标以及识别频率捷径的方法。结果显示,频率捷径可能基于纹理或形状,具体取决于最能简化目标的方式。第三,我们验证了频率捷径在分布外(OOD)测试集上的可迁移性。实验表明,频率捷径可在不同数据集间迁移,且无法通过增大模型容量或数据增强完全消除。我们建议未来研究应聚焦于能缓解频率捷径学习的有效训练方案。