Convolutional neural networks (CNNs) for image processing tend to focus on localized texture patterns, commonly referred to as texture bias. While most of the previous works in the literature focus on the task of image classification, we go beyond this and study the texture bias of CNNs in semantic segmentation. In this work, we propose to train CNNs on pre-processed images with less texture to reduce the texture bias. Therein, the challenge is to suppress image texture while preserving shape information. To this end, we utilize edge enhancing diffusion (EED), an anisotropic image diffusion method initially introduced for image compression, to create texture reduced duplicates of existing datasets. Extensive numerical studies are performed with both CNNs and vision transformer models trained on original data and EED-processed data from the Cityscapes dataset and the CARLA driving simulator. We observe strong texture-dependence of CNNs and moderate texture-dependence of transformers. Training CNNs on EED-processed images enables the models to become completely ignorant with respect to texture, demonstrating resilience with respect to texture re-introduction to any degree. Additionally we analyze the performance reduction in depth on a level of connected components in the semantic segmentation and study the influence of EED pre-processing on domain generalization as well as adversarial robustness.
翻译:用于图像处理的卷积神经网络(CNN)往往聚焦于局部纹理模式,这通常被称为纹理偏差。尽管文献中先前的研究大多集中于图像分类任务,但我们超越这一范畴,探究了CNN在语义分割中的纹理偏差。本研究提出在纹理减少的预处理图像上训练CNN以降低纹理偏差。其中的挑战在于抑制图像纹理的同时保留形状信息。为此,我们采用最初为图像压缩提出的各向异性图像扩散方法——边缘增强扩散(EED),为现有数据集创建纹理减少的副本。我们使用CNN和视觉Transformer模型,在原始数据及来自Cityscapes数据集和CARLA驾驶模拟器的EED处理数据上进行了广泛的数值研究。观察到CNN对纹理具有强依赖性,而Transformer对纹理的依赖性适中。在EED处理图像上训练CNN可使模型对纹理完全无感,展现出对任何程度纹理重新引入的鲁棒性。此外,我们深入分析了语义分割中连通组件层面的性能下降,并研究了EED预处理对领域泛化及对抗鲁棒性的影响。