Convolutional Neural Networks (CNNs) are known to exhibit a strong texture bias, favoring local patterns over global shape information--a tendency inherent to their convolutional architecture. While this bias is beneficial for texture-rich natural images, it often degrades performance on shape-dominant data such as illustrations and sketches. Although prior work has proposed shape-biased models to mitigate this issue, these approaches lack a quantitative metric for identifying which datasets would actually benefit from such modifications. To address this gap, we propose a data-driven metric that quantifies the shape-texture balance of a dataset by computing the Structural Similarity Index (SSIM) between each image's luminance channel and its L0-smoothed counterpart. Building on this metric, we further introduce a computationally efficient adaptation method that promotes shape bias by modifying the dilation of max-pooling operations while keeping convolutional weights frozen. Experimental results show that this approach consistently improves classification accuracy on shape-dominant datasets, particularly in low-data regimes where full fine-tuning is impractical, requiring training only the final classification layer.
翻译:卷积神经网络(CNN)因其卷积架构的固有特性,表现出强烈的纹理偏好,倾向于依赖局部模式而非全局形状信息。虽然这种偏好在处理纹理丰富的自然图像时具有优势,但在处理以形状为主导的数据(如插画和草图)时,其性能往往会下降。尽管先前的研究提出了具有形状偏好的模型以缓解此问题,但这些方法缺乏一种量化指标来识别哪些数据集真正受益于此类修改。为填补这一空白,我们提出了一种数据驱动的度量方法,通过计算每幅图像亮度通道与其L0平滑版本之间的结构相似性指数(SSIM),来量化数据集的形状-纹理平衡。基于这一度量,我们进一步引入了一种计算高效的适应方法,该方法通过修改最大池化操作的膨胀率(同时保持卷积权重不变)来增强形状偏好。实验结果表明,该方法在形状主导的数据集上持续提升了分类准确率,特别是在数据量较少、无法进行完整微调的场景下,仅需训练最后的分类层即可实现性能改进。