Illusory Attacks: Detectability Matters in Adversarial Attacks on Sequential Decision-Makers

Autonomous agents deployed in the real world need to be robust against adversarial attacks on sensory inputs. Robustifying agent policies requires anticipating the strongest attacks possible. We demonstrate that existing observation-space attacks on reinforcement learning agents have a common weakness: while effective, their lack of temporal consistency makes them detectable using automated means or human inspection. Detectability is undesirable to adversaries as it may trigger security escalations. We introduce perfect illusory attacks, a novel form of adversarial attack on sequential decision-makers that is both effective and provably statistically undetectable. We then propose the more versatile R-attacks, which result in observation transitions that are consistent with the state-transition function of the adversary-free environment and can be learned end-to-end. Compared to existing attacks, we empirically find R-attacks to be significantly harder to detect with automated methods, and a small study with human subjects suggests they are similarly harder to detect for humans. We propose that undetectability should be a central concern in the study of adversarial attacks on mixed-autonomy settings.

翻译：部署于现实世界的自主智能体需抵御针对感知输入的对抗性攻击。强化智能体策略需要预判可能的最强攻击。我们证明，现有针对强化学习智能体的观测空间攻击存在共同弱点：尽管效果显著，但其缺乏时间一致性，可通过自动化手段或人工审查检测。可检测性对攻击者不利，因其可能触发安全升级。我们提出完美幻觉攻击——一种针对序列决策者的新型对抗攻击形式，兼具有效性与可证明的统计不可检测性。继而提出更具通用性的R攻击，该方法生成的观测转移与无对抗环境的状态转移函数一致，且可通过端到端方式学习。实验表明，与现有攻击相比，R攻击更难被自动化方法检测，而小规模人类受试者研究显示，此类攻击对人类同样难以识别。我们主张，在混合自主环境下的对抗攻击研究中，不可检测性应成为核心关注点。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日