Does resistance to Style-Transfer equal Shape Bias? Evaluating Shape Bias by Distorted Shape

Deep learning models are known to exhibit a strong texture bias, while human tends to rely heavily on global shape for object recognition. The current benchmark for evaluating a model's shape bias is a set of style-transferred images with the assumption that resistance to the attack of style transfer is related to the development of shape sensitivity in the model. In this work, we show that networks trained with style-transfer images indeed learn to ignore style, but its shape bias arises primarily from local shapes. We provide a Distorted Shape Testbench (DiST) as an alternative measurement of global shape sensitivity. Our test includes 2400 original images from ImageNet-1K, each of which is accompanied by two images with the global shapes of the original image distorted while preserving its texture via the texture synthesis program. We found that (1) models that performed well on the previous shape bias evaluation do not fare well in the proposed DiST; (2) the widely adopted ViT models do not show significant advantages over Convolutional Neural Networks (CNNs) on this benchmark despite that ViTs rank higher on the previous shape bias tests. (3) training with DiST images bridges the significant gap between human and existing SOTA models' performance while preserving the models' accuracy on standard image classification tasks; training with DiST images and style-transferred images are complementary, and can be combined to train network together to enhance both the global and local shape sensitivity of the network. Our code will be host at: https://github.com/leelabcnbc/DiST

翻译：深度学习模型已知表现出强烈的纹理偏差，而人类在物体识别中通常更依赖全局形状。当前评估模型形状偏差的标准基准是一组经过风格迁移的图像，其假设是模型对风格迁移攻击的抵抗能力与其形状敏感性的发展相关。本研究表明，使用风格迁移图像训练的网络确实能学会忽略风格，但其形状偏差主要源于局部形状。我们提出扭曲形状测试台（DiST）作为全局形状敏感性的替代测量方法。该测试包含来自ImageNet-1K的2400张原始图像，每张图像附带两张通过纹理合成程序保持纹理不变而扭曲原始图像全局形状的衍生图像。我们发现：（1）在先前形状偏差评估中表现良好的模型在提出的DiST中表现不佳；（2）尽管视觉Transformer（ViT）在先前的形状偏差测试中排名更高，但被广泛采用的ViT模型在此基准上并未比卷积神经网络（CNN）显示出显著优势；（3）使用DiST图像训练可在保持模型标准图像分类任务精度的同时，弥合人类与现有最先进模型性能之间的显著差距；使用DiST图像与风格迁移图像训练具有互补性，可联合训练网络以增强网络的全局与局部形状敏感性。我们的代码将托管于：https://github.com/leelabcnbc/DiST