Recent studies reveal that well-performing reinforcement learning (RL) agents in training often lack resilience against adversarial perturbations during deployment. This highlights the importance of building a robust agent before deploying it in the real world. Most prior works focus on developing robust training-based procedures to tackle this problem, including enhancing the robustness of the deep neural network component itself or adversarially training the agent on strong attacks. In this work, we instead study an input transformation-based defense for RL. Specifically, we propose using a variant of vector quantization (VQ) as a transformation for input observations, which is then used to reduce the space of adversarial attacks during testing, resulting in the transformed observations being less affected by attacks. Our method is computationally efficient and seamlessly integrates with adversarial training, further enhancing the robustness of RL agents against adversarial attacks. Through extensive experiments in multiple environments, we demonstrate that using VQ as the input transformation effectively defends against adversarial attacks on the agent's observations.
翻译:近期研究表明,训练中表现优异的强化学习(RL)智能体在部署时往往缺乏对抗对抗性扰动的鲁棒性。这凸显了在现实世界部署前构建鲁棒智能体的重要性。大多数先前工作侧重于开发基于鲁棒训练的方法来解决此问题,包括增强深度神经网络组件本身的鲁棒性,或通过对强攻击进行对抗训练来训练智能体。在本工作中,我们转而研究一种基于输入变换的RL防御方法。具体而言,我们提出使用向量量化(VQ)的一种变体作为输入观测的变换方法,该变换用于在测试时缩减对抗攻击的空间,从而使变换后的观测受攻击影响更小。我们的方法计算高效,并能与对抗训练无缝集成,进一步提升RL智能体对抗对抗攻击的鲁棒性。通过在多种环境中进行大量实验,我们证明使用VQ作为输入变换能有效防御针对智能体观测的对抗攻击。