White-Box Adversarial Policies in Deep Reinforcement Learning

In reinforcement learning (RL), adversarial policies can be developed by training an adversarial agent to minimize a target agent's rewards. Prior work has studied black-box versions of these attacks where the adversary only observes the world state and treats the target agent as any other part of the environment. However, this does not take into account additional structure in the problem. In this work, we take inspiration from the literature on white-box attacks to train more effective adversarial policies. We study white-box adversarial policies and show that having access to a target agent's internal state can be useful for identifying its vulnerabilities. We make two contributions. (1) We introduce white-box adversarial policies where an attacker observes both a target's internal state and the world state at each timestep. We formulate ways of using these policies to attack agents in 2-player games and text-generating language models. (2) We demonstrate that these policies can achieve higher initial and asymptotic performance against a target agent than black-box controls. Code is available at https://github.com/thestephencasper/lm_white_box_attacks

翻译：在强化学习（RL）中，可以通过训练一个对抗性智能体来最小化目标智能体的奖励，从而开发出对抗策略。先前的工作研究了这些攻击的黑盒版本，其中攻击者仅观察世界状态，并将目标智能体视为环境中的任何其他部分。然而，这并未考虑问题中额外的结构。在本工作中，我们借鉴白盒攻击文献的思路，训练更有效的对抗策略。我们研究白盒对抗策略，并表明访问目标智能体的内部状态有助于识别其脆弱性。我们做出两项贡献：（1）我们引入了白盒对抗策略，其中攻击者在每个时间步同时观察目标的内部状态和世界状态。我们制定了使用这些策略来攻击双人游戏中的智能体和文本生成语言模型的方法。（2）我们证明，与黑盒控制相比，这些策略能够针对目标智能体实现更高的初始性能和渐近性能。代码可在 https://github.com/thestephencasper/lm_white_box_attacks 获取。

相关内容

白盒

关注 0

白盒测试（也称为透明盒测试，玻璃盒测试，透明盒测试和结构测试）是一种软件测试方法，用于测试应用程序的内部结构或功能，而不是其功能（即黑盒测试）。在白盒测试中，系统的内部视角以及编程技能被用来设计测试用例。测试人员选择输入以遍历代码的路径并确定预期的输出。这类似于测试电路中的节点，在线测试（ICT）。白盒测试可以应用于软件测试过程的单元，集成和系统级别。尽管传统的测试人员倾向于将白盒测试视为在单元级别进行的，但如今它已越来越频繁地用于集成和系统测试。它可以测试单元内的路径，集成期间单元之间的路径以及系统级测试期间子系统之间的路径。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日