Embodied agents in vision navigation coupled with deep neural networks have attracted increasing attention. However, deep neural networks are vulnerable to malicious adversarial noises, which may potentially cause catastrophic failures in Embodied Vision Navigation. Among these adversarial noises, universal adversarial perturbations (UAP), i.e., the image-agnostic perturbation applied on each frame received by the agent, are more critical for Embodied Vision Navigation since they are computation-efficient and application-practical during the attack. However, existing UAP methods do not consider the system dynamics of Embodied Vision Navigation. For extending UAP in the sequential decision setting, we formulate the disturbed environment under the universal noise $\delta$, as a $\delta$-disturbed Markov Decision Process ($\delta$-MDP). Based on the formulation, we analyze the properties of $\delta$-MDP and propose two novel Consistent Attack methods for attacking Embodied agents, which first consider the dynamic of the MDP by estimating the disturbed Q function and the disturbed distribution. In spite of victim models, our Consistent Attack can cause a significant drop in the performance for the Goalpoint task in habitat. Extensive experimental results indicate that there exist potential risks for applying Embodied Vision Navigation methods to the real world.
翻译:具身智能体在结合深度神经网络的视觉导航中日益受到关注。然而,深度神经网络易受恶意对抗噪声的影响,这可能在具身视觉导航中导致灾难性故障。在这些对抗噪声中,通用对抗扰动(Universal Adversarial Perturbation, UAP),即应用于智能体接收的每一帧图像且与图像无关的扰动,对具身视觉导航更为关键,因其在攻击过程中计算高效且实用性强。然而,现有UAP方法未考虑具身视觉导航的系统动态性。为了将UAP扩展到序列决策场景中,我们将通用噪声$\delta$下的受扰环境建模为$\delta$-受扰马尔可夫决策过程($\delta$-MDP)。基于该建模,我们分析了$\delta$-MDP的特性,并提出了两种新的一致性攻击方法用于攻击具身智能体,这些方法首次通过估计受扰Q函数和受扰分布考虑了MDP的动态性。尽管受害者模型不同,我们的一致性攻击仍能在Habitat平台的目标点任务中导致性能显著下降。大量实验结果表明,将具身视觉导航方法应用于现实世界存在潜在风险。