Illusory Attacks: Detectability Matters in Adversarial Attacks on Sequential Decision-Makers

Autonomous agents deployed in the real world need to be robust against adversarial attacks on sensory inputs. Robustifying agent policies requires anticipating the strongest attacks possible. We demonstrate that existing observation-space attacks on reinforcement learning agents have a common weakness: while effective, their lack of temporal consistency makes them detectable using automated means or human inspection. Detectability is undesirable to adversaries as it may trigger security escalations. We introduce perfect illusory attacks, a novel form of adversarial attack on sequential decision-makers that is both effective and provably statistically undetectable. We then propose the more versatile E-illusory attacks, which result in observation transitions that are consistent with the state-transition function of the environment and can be learned end-to-end. Compared to existing attacks, we empirically find E-illusory attacks to be significantly harder to detect with automated methods, and a small study with human subjects suggests they are similarly harder to detect for humans. We conclude that future work on adversarial robustness of \mbox{(human-)AI} systems should focus on defences against attacks that are hard to detect by design.

翻译：部署于现实世界的自主智能体需具备对感官输入对抗攻击的鲁棒性。强化智能体策略需要预判可能的最强攻击。我们证明，现有针对强化学习智能体的观测空间攻击存在共同缺陷：虽具攻击效力，但其时间一致性的缺失导致可通过自动化手段或人工审查被检测。对攻击者而言，可检测性会触发安全升级机制，因此不可取。我们提出完美错觉攻击——一种针对序贯决策模型的新型对抗攻击形式，兼具攻击效力与可证明的统计不可检测性。进而提出更具通用性的E-错觉攻击，该攻击产生的观测转移与环境状态转移函数一致，且可通过端到端方式学习。实验表明，与现有攻击相比，E-错觉攻击更难被自动化方法检测；针对人类受试者的小规模研究亦表明，该攻击对人类同样难以察觉。我们得出结论：关于（人机）AI系统对抗鲁棒性的未来研究，应聚焦于如何防御此类设计上难以检测的攻击。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日