The hypothesis that Convolutional Neural Networks (CNNs) are inherently texture-biased has shaped much of the discourse on feature use in deep learning. We revisit this hypothesis by examining limitations in the cue-conflict experiment by Geirhos et al. To address these limitations, we propose a domain-agnostic framework that quantifies feature reliance through systematic suppression of shape, texture, and color cues, avoiding the confounds of forced-choice conflicts. By evaluating humans and neural networks under controlled suppression conditions, we find that CNNs are not inherently texture-biased but predominantly rely on local shape features. Nonetheless, this reliance can be substantially mitigated through modern training strategies or architectures (ConvNeXt, ViTs). We further extend the analysis across computer vision, medical imaging, and remote sensing, revealing that reliance patterns differ systematically: computer vision models prioritize shape, medical imaging models emphasize color, and remote sensing models exhibit a stronger reliance on texture. Code is available at https://github.com/tomburgert/feature-reliance.
翻译:卷积神经网络(CNN)天生具有纹理偏向的假设,在很大程度上塑造了关于深度学习特征使用的讨论。我们通过审视Geirhos等人提出的线索冲突实验的局限性,重新检验了这一假设。为应对这些局限,我们提出了一种领域无关的框架,通过系统性地抑制形状、纹理和颜色线索来量化特征依赖,避免了强制选择冲突所带来的混淆。通过在受控抑制条件下评估人类和神经网络,我们发现CNN并非天生偏向纹理,而是主要依赖局部形状特征。然而,这种依赖可以通过现代训练策略或架构(ConvNeXt、ViTs)得到显著缓解。我们进一步将分析扩展到计算机视觉、医学影像和遥感领域,揭示了依赖模式存在系统性差异:计算机视觉模型优先考虑形状,医学影像模型强调颜色,而遥感模型则表现出更强的纹理依赖。代码可在https://github.com/tomburgert/feature-reliance获取。