While Neural Networks (NNs) have surpassed human accuracy in image classification on ImageNet, they often lack robustness against image corruption, i.e., corruption robustness. Yet such robustness is seemingly effortless for human perception. In this paper, we propose visually-continuous corruption robustness (VCR) -- an extension of corruption robustness to allow assessing it over the wide and continuous range of changes that correspond to the human perceptive quality (i.e., from the original image to the full distortion of all perceived visual information), along with two novel human-aware metrics for NN evaluation. To compare VCR of NNs with human perception, we conducted extensive experiments on 14 commonly used image corruptions with 7,718 human participants and state-of-the-art robust NN models with different training objectives (e.g., standard, adversarial, corruption robustness), different architectures (e.g., convolution NNs, vision transformers), and different amounts of training data augmentation. Our study showed that: 1) assessing robustness against continuous corruption can reveal insufficient robustness undetected by existing benchmarks; as a result, 2) the gap between NN and human robustness is larger than previously known; and finally, 3) some image corruptions have a similar impact on human perception, offering opportunities for more cost-effective robustness assessments. Our validation set with 14 image corruptions, human robustness data, and the evaluation code is provided as a toolbox and a benchmark.
翻译:虽然神经网络在ImageNet图像分类任务上已超越人类准确率,但其对图像腐败(即腐败鲁棒性)通常缺乏鲁棒性。然而,人类感知却似乎能毫不费力地获得这种鲁棒性。本文提出视觉连续腐败鲁棒性(VCR)——将腐败鲁棒性扩展至能够评估对应人类感知质量(即从原始图像到所有感知视觉信息完全失真)的广泛连续变化范围,同时引入两种用于神经网络评估的新型人类意识度量。为比较神经网络的VCR与人类感知,我们开展了大规模实验:涉及14种常用图像腐败、7,718名人类参与者以及采用不同训练目标(如标准训练、对抗训练、腐败鲁棒训练)、不同架构(如卷积神经网络、视觉Transformer)和不同训练数据增强量的最先进鲁棒神经网络模型。研究表明:1)对连续腐败的鲁棒性评估能揭示现有基准无法检测的鲁棒性不足;2)因此,神经网络与人类鲁棒性之间的差距比先前已知更大;3)部分图像腐败对人类感知具有相似影响,为更经济的鲁棒性评估提供了机会。我们提供了包含14种图像腐败的验证集、人类鲁棒性数据及评估代码,作为工具箱与基准。