Recent research has investigated the shape and texture biases of pre-trained deep neural networks (DNNs) in image classification. Those works test how much a trained DNN relies on specific image cues like texture. The present study shifts the focus to understanding the cue influence during training, analyzing what DNNs can learn from shape, texture, and color cues in absence of the others; investigating their individual and combined influence on the learning success. We analyze these cue influences at multiple levels by decomposing datasets into cue-specific versions. Addressing semantic segmentation, we learn the given task from these reduced cue datasets, creating cue experts. Early fusion of cues is performed by constructing appropriate datasets. This is complemented by a late fusion of experts which allows us to study cue influence location-dependent on pixel level. Experiments on Cityscapes, PASCAL Context, and a synthetic CARLA dataset show that while no single cue dominates, the shape + color expert predominantly improves the prediction of small objects and border pixels. The cue performance order is consistent for the tested convolutional and transformer architecture, indicating similar cue extraction capabilities, although pre-trained transformers are said to be more biased towards shape than convolutional neural networks.
翻译:近期研究探讨了预训练深度神经网络(DNN)在图像分类任务中的形状与纹理偏好。这些工作测试了训练后的DNN对纹理等特定图像线索的依赖程度。本研究将焦点转向理解训练过程中线索的影响,分析在缺乏其他线索时DNN能从形状、纹理及颜色线索中学到什么;探究这些线索单独及组合对学习成效的影响。我们通过将数据集分解为特定线索版本,在多层次上分析这些线索的影响。针对语义分割任务,我们利用这些简化线索数据集学习给定任务,构建线索专家模型。通过构建适配数据集实现线索的早期融合,并辅以专家模型的后期融合,使我们能在像素级别上研究位置相关的线索影响。在Cityscapes、PASCAL Context及合成CARLA数据集上的实验表明:虽然单一线索不占主导地位,但形状+颜色专家模型能显著提升小物体与边界像素的预测精度。对于测试的卷积与Transformer架构,线索性能排序保持一致,表明其具有相似的线索提取能力——尽管预训练Transformer被认为比卷积神经网络更倾向于形状线索。