Convolutional neural networks (CNNs) for image processing tend to focus on localized texture patterns, commonly referred to as texture bias. While most of the previous works in the literature focus on the task of image classification, we go beyond this and study the texture bias of CNNs in semantic segmentation. In this work, we propose to train CNNs on pre-processed images with less texture to reduce the texture bias. Therein, the challenge is to suppress image texture while preserving shape information. To this end, we utilize edge enhancing diffusion (EED), an anisotropic image diffusion method initially introduced for image compression, to create texture reduced duplicates of existing datasets. Extensive numerical studies are performed with both CNNs and vision transformer models trained on original data and EED-processed data from the Cityscapes dataset and the CARLA driving simulator. We observe strong texture-dependence of CNNs and moderate texture-dependence of transformers. Training CNNs on EED-processed images enables the models to become completely ignorant with respect to texture, demonstrating resilience with respect to texture re-introduction to any degree. Additionally we analyze the performance reduction in depth on a level of connected components in the semantic segmentation and study the influence of EED pre-processing on domain generalization as well as adversarial robustness.
翻译:卷积神经网络(CNN)在图像处理中往往倾向于关注局部纹理模式,即所谓的纹理偏差。尽管现有文献大多聚焦于图像分类任务,但我们进一步研究了CNN在语义分割中的纹理偏差。本文提出在纹理较少的预处理图像上训练CNN以降低其纹理偏差。其关键挑战在于抑制图像纹理的同时保留形状信息。为此,我们利用边缘增强扩散(EED)——一种最初为图像压缩提出的各向异性图像扩散方法——创建现有数据集的纹理缩减副本。我们基于Cityscapes数据集和CARLA驾驶模拟器,对使用原始数据和EED处理数据训练的CNN及视觉Transformer模型进行了大量数值研究。发现CNN表现出强烈的纹理依赖性,而Transformer的纹理依赖性中等。在EED处理图像上训练CNN可使模型完全忽略纹理,展现出对任意程度纹理重引入的鲁棒性。此外,我们从语义分割中连通分量层面深入分析了性能下降,并研究了EED预处理对领域泛化及对抗鲁棒性的影响。