The visual object category reports of artificial neural networks (ANNs) are notoriously sensitive to tiny, adversarial image perturbations. Because human category reports (aka human percepts) are thought to be insensitive to those same small-norm perturbations -- and locally stable in general -- this argues that ANNs are incomplete scientific models of human visual perception. Consistent with this, we show that when small-norm image perturbations are generated by standard ANN models, human object category percepts are indeed highly stable. However, in this very same "human-presumed-stable" regime, we find that robustified ANNs reliably discover low-norm image perturbations that strongly disrupt human percepts. These previously undetectable human perceptual disruptions are massive in amplitude, approaching the same level of sensitivity seen in robustified ANNs. Further, we show that robustified ANNs support precise perceptual state interventions: they guide the construction of low-norm image perturbations that strongly alter human category percepts toward specific prescribed percepts. These observations suggest that for arbitrary starting points in image space, there exists a set of nearby "wormholes", each leading the subject from their current category perceptual state into a semantically very different state. Moreover, contemporary ANN models of biological visual processing are now accurate enough to consistently guide us to those portals.
翻译:人工神经网络(ANN)对微小对抗性图像扰动极其敏感,其视觉对象类别报告因此备受质疑。由于人类类别报告(即人类感知)被认为对这些小范数扰动不敏感且整体局部稳定,这表明ANN作为人类视觉感知的科学模型存在不完整性。与此一致的是,我们发现当标准ANN模型生成小范数图像扰动时,人类对象类别感知确实高度稳定。然而,正是在这种“假定人类稳定”的区域内,鲁棒化ANN可靠地发现了能强烈破坏人类感知的低范数图像扰动。这些先前无法检测的人类感知扰动幅度巨大,其敏感度接近鲁棒化ANN的水平。进一步地,我们证明鲁棒化ANN支持精确的感知状态干预:它们能够引导构建低范数图像扰动,从而将人类类别感知强烈导向特定预设感知。这些观察表明,在图像空间中任意起点附近存在一组邻近的“虫洞”,每个虫洞使受试者从其当前类别感知状态进入语义上截然不同的状态。此外,当代生物视觉处理的ANN模型已足够精确,能持续引导我们找到这些通道。