Convolutional Neural Networks (CNNs) exhibit a well-known texture bias, prioritizing local patterns over global shapes - a tendency inherent to their convolutional architecture. While this bias is beneficial for texture-rich natural images, it often degrades performance on shape-dominant data such as illustrations and sketches. Although prior work has proposed shape-biased models to mitigate this issue, these approaches lack a quantitative metric for identifying which datasets would actually benefit from such modifications. To address this limitation, we propose a data-driven metric that quantifies the shape-texture balance within a dataset by computing the Structural Similarity Index (SSIM) between an image's luminance (Y) channel and its L0-smoothed counterpart. Building on this metric, we introduce a computationally efficient adaptation method that promotes shape bias by modifying the dilation of max-pooling operations while keeping convolutional weights frozen. Experimental results demonstrate consistent accuracy improvements on shape-dominant datasets, particularly in low-data regimes where full fine-tuning is impractical, requiring training only the final classification layer.
翻译:卷积神经网络(CNN)表现出一种众所周知的纹理偏置,即优先考虑局部模式而非全局形状——这是其卷积架构固有的倾向。虽然这种偏置对于纹理丰富的自然图像有益,但在以形状为主导的数据(如插图和草图)上通常会降低性能。尽管先前的研究提出了形状偏置模型来缓解此问题,但这些方法缺乏定量指标来识别哪些数据集实际上会从此类修改中受益。为解决这一局限,我们提出了一种数据驱动的指标,通过计算图像亮度(Y)通道与其L0平滑对应版本之间的结构相似性指数(SSIM),来量化数据集内的形状-纹理平衡。基于此指标,我们引入了一种计算高效的适应方法,该方法通过修改最大池化操作的膨胀率来促进形状偏置,同时保持卷积权重不变。实验结果表明,在以形状为主导的数据集上,尤其是在数据量较少、无法进行完整微调(仅需训练最终分类层)的情况下,该方法能持续提升准确率。