The visual object category reports of artificial neural networks (ANNs) are notoriously sensitive to tiny, adversarial image perturbations. Because human category reports (aka human percepts) are thought to be insensitive to those same small-norm perturbations -- and locally stable in general -- this argues that ANNs are incomplete scientific models of human visual perception. Consistent with this, we show that when small-norm image perturbations are generated by standard ANN models, human object category percepts are indeed highly stable. However, in this very same "human-presumed-stable" regime, we find that robustified ANNs reliably discover low-norm image perturbations that strongly disrupt human percepts. These previously undetectable human perceptual disruptions are massive in amplitude, approaching the same level of sensitivity seen in robustified ANNs. Further, we show that robustified ANNs support precise perceptual state interventions: they guide the construction of low-norm image perturbations that strongly alter human category percepts toward specific prescribed percepts. These observations suggest that for arbitrary starting points in image space, there exists a set of nearby "wormholes", each leading the subject from their current category perceptual state into a semantically very different state. Moreover, contemporary ANN models of biological visual processing are now accurate enough to consistently guide us to those portals.
翻译:人工神经网络(ANN)的视觉对象类别报告以对微小对抗性图像扰动极为敏感而著称。由于人类类别报告(即人类感知)被认为对这些同尺度小范数扰动不敏感,且总体上具有局部稳定性,这表明ANN并非人类视觉感知的完备科学模型。与此一致,我们证明当由标准ANN模型生成小范数图像扰动时,人类物体类别感知确实高度稳定。然而,正是在这一“预设人类稳定”的范围内,我们发现鲁棒化ANN可靠地发现了能强烈干扰人类感知的低范数图像扰动。这些此前无法检测到的人类感知干扰在幅度上极为剧烈,接近鲁棒化ANN所表现出的同等敏感水平。此外,我们证明鲁棒化ANN支持精确的感知状态干预:它们指导构建能针对特定预设感知显著改变人类类别感知的低范数图像扰动。这些观察表明,对于图像空间中的任意起点,存在一组邻近的“虫洞”,每个虫洞都能将被试从当前的类别感知状态引导至语义截然不同的状态。更重要的是,当代生物视觉处理的ANN模型现已足够精确,可以持续引导我们找到这些通道。