Physical world adversarial attack is a highly practical and threatening attack, which fools real world deep learning systems by generating conspicuous and maliciously crafted real world artifacts. In physical world attacks, evaluating naturalness is highly emphasized since human can easily detect and remove unnatural attacks. However, current studies evaluate naturalness in a case-by-case fashion, which suffers from errors, bias and inconsistencies. In this paper, we take the first step to benchmark and assess visual naturalness of physical world attacks, taking autonomous driving scenario as the first attempt. First, to benchmark attack naturalness, we contribute the first Physical Attack Naturalness (PAN) dataset with human rating and gaze. PAN verifies several insights for the first time: naturalness is (disparately) affected by contextual features (i.e., environmental and semantic variations) and correlates with behavioral feature (i.e., gaze signal). Second, to automatically assess attack naturalness that aligns with human ratings, we further introduce Dual Prior Alignment (DPA) network, which aims to embed human knowledge into model reasoning process. Specifically, DPA imitates human reasoning in naturalness assessment by rating prior alignment and mimics human gaze behavior by attentive prior alignment. We hope our work fosters researches to improve and automatically assess naturalness of physical world attacks. Our code and dataset can be found at https://github.com/zhangsn-19/PAN.
翻译:物理世界对抗攻击是一种高度实用且具有威胁性的攻击方式,通过生成显著且恶意构造的现实世界伪影来欺骗实际部署的深度学习系统。在物理世界攻击中,自然性评估尤为重要,因为人类能够轻松检测并移除不自然的攻击。然而,当前研究采用逐案例方法评估自然性,存在误差、偏差和不一致性问题。本文首次对物理世界攻击的视觉自然性进行基准测试与评估,并以自动驾驶场景作为初步尝试。首先,为建立攻击自然性基准,我们贡献了首个包含人工评分和注视数据的物理攻击自然性(PAN)数据集。PAN首次验证了若干洞见:自然性受上下文特征(即环境与语义变化)的(非均匀)影响,并与行为特征(即注视信号)相关。其次,为实现与人类评分对齐的自动自然性评估,我们进一步提出双先验对齐(DPA)网络,旨在将人类知识嵌入模型推理过程。具体而言,DPA通过评分先验对齐模拟人类在自然性评估中的推理,并通过注意力先验对齐模仿人类注视行为。我们希望本研究能够推动对物理世界攻击自然性的改进与自动评估。我们的代码和数据集位于 https://github.com/zhangsn-19/PAN。