Meta reinforcement learning (meta RL), as a combination of meta-learning ideas and reinforcement learning (RL), enables the agent to adapt to different tasks using a few samples. However, this sampling-based adaptation also makes meta RL vulnerable to adversarial attacks. By manipulating the reward feedback from sampling processes in meta RL, an attacker can mislead the agent into building wrong knowledge from training experience, which deteriorates the agent's performance when dealing with different tasks after adaptation. This paper provides a game-theoretical underpinning for understanding this type of security risk. In particular, we formally define the sampling attack model as a Stackelberg game between the attacker and the agent, which yields a minimax formulation. It leads to two online attack schemes: Intermittent Attack and Persistent Attack, which enable the attacker to learn an optimal sampling attack, defined by an $\epsilon$-first-order stationary point, within $\mathcal{O}(\epsilon^{-2})$ iterations. These attack schemes freeride the learning progress concurrently without extra interactions with the environment. By corroborating the convergence results with numerical experiments, we observe that a minor effort of the attacker can significantly deteriorate the learning performance, and the minimax approach can also help robustify the meta RL algorithms.
翻译:元强化学习(meta RL)融合了元学习思想与强化学习(RL),使智能体能够利用少量样本适应不同任务。然而,这种基于样本的适应机制也使元强化学习易受对抗性攻击。通过操纵元强化学习采样过程中的奖励反馈,攻击者可误导智能体从训练经验中建立错误认知,从而削弱其在适应后处理不同任务时的性能。本文为理解此类安全风险建立了博弈论基础。具体而言,我们将采样攻击模型形式化为攻击者与智能体之间的斯塔克尔伯格博弈,由此得到极小极大公式。该公式衍生出两种在线攻击方案:间歇性攻击与持续性攻击,使攻击者能在$\mathcal{O}(\epsilon^{-2})$次迭代内学习到由$\epsilon$-一阶稳定点定义的最优采样攻击。这些攻击方案无需与环境额外交互即可并行利用学习进程。通过收敛性分析与数值实验的相互印证,我们观察到攻击者的微小努力即可显著恶化学习性能,而极小极大方法亦能增强元强化学习算法的鲁棒性。