Frequency analysis is useful for understanding the mechanisms of representation learning in neural networks (NNs). Most research in this area focuses on the learning dynamics of NNs for regression tasks, while little for classification. This study empirically investigates the latter and expands the understanding of frequency shortcuts. First, we perform experiments on synthetic datasets, designed to have a bias in different frequency bands. Our results demonstrate that NNs tend to find simple solutions for classification, and what they learn first during training depends on the most distinctive frequency characteristics, which can be either low- or high-frequencies. Second, we confirm this phenomenon on natural images. We propose a metric to measure class-wise frequency characteristics and a method to identify frequency shortcuts. The results show that frequency shortcuts can be texture-based or shape-based, depending on what best simplifies the objective. Third, we validate the transferability of frequency shortcuts on out-of-distribution (OOD) test sets. Our results suggest that frequency shortcuts can be transferred across datasets and cannot be fully avoided by larger model capacity and data augmentation. We recommend that future research should focus on effective training schemes mitigating frequency shortcut learning.
翻译:频率分析为理解神经网络表征学习的机制提供了有效工具。目前该领域研究多聚焦于回归任务中神经网络的学习动态,而对分类任务的探索相对不足。本研究通过实证方法拓展了对频率捷径现象的理解。首先,我们设计了具有不同频段偏置的合成数据集进行实验,结果表明神经网络倾向于为分类任务寻找简单解,其训练初期首先学习的内容取决于最具区分度的频率特征——既可能是低频也可能是高频成分。其次,我们在自然图像上验证了这一现象,提出了衡量类别间频率特征的度量指标及识别频率捷径的方法,发现频率捷径可能基于纹理或形状特征,具体取决于何种特征最能简化目标任务。最后,我们验证了频率捷径在分布外测试集上的可迁移性,结果表明频率捷径可跨数据集迁移,且无法通过扩大模型容量或增强数据增强完全规避。我们建议未来研究应关注能有效缓解频率捷径学习的训练机制。