In offline multi-agent reinforcement learning (MARL), agents estimate policies from a given dataset. We study reward-poisoning attacks in this setting where an exogenous attacker modifies the rewards in the dataset before the agents see the dataset. The attacker wants to guide each agent into a nefarious target policy while minimizing the $L^p$ norm of the reward modification. Unlike attacks on single-agent RL, we show that the attacker can install the target policy as a Markov Perfect Dominant Strategy Equilibrium (MPDSE), which rational agents are guaranteed to follow. This attack can be significantly cheaper than separate single-agent attacks. We show that the attack works on various MARL agents including uncertainty-aware learners, and we exhibit linear programs to efficiently solve the attack problem. We also study the relationship between the structure of the datasets and the minimal attack cost. Our work paves the way for studying defense in offline MARL.
翻译:在离线多智能体强化学习(MARL)中,智能体根据给定的数据集估计策略。我们研究了该场景下的奖励投毒攻击,其中外源性攻击者在智能体接触数据集之前修改数据集中的奖励。攻击者希望在最小化奖励修改的$L^p$范数的同时,引导每个智能体进入恶意的目标策略。与单智能体强化学习攻击不同,我们证明攻击者可以将目标策略设置为马尔可夫完美主导策略均衡(MPDSE),理性智能体必然会遵循该均衡。这种攻击可能比单独的单智能体攻击成本显著更低。我们证明该攻击对多种MARL智能体(包括考虑不确定性的学习器)有效,并提出了线性规划方法以高效求解攻击问题。我们还研究了数据集结构与最小攻击成本之间的关系。我们的工作为离线MARL中的防御研究奠定了基础。