In targeted poisoning attacks, an attacker manipulates an agent-environment interaction to force the agent into adopting a policy of interest, called target policy. Prior work has primarily focused on attacks that modify standard MDP primitives, such as rewards or transitions. In this paper, we study targeted poisoning attacks in a two-agent setting where an attacker implicitly poisons the effective environment of one of the agents by modifying the policy of its peer. We develop an optimization framework for designing optimal attacks, where the cost of the attack measures how much the solution deviates from the assumed default policy of the peer agent. We further study the computational properties of this optimization framework. Focusing on a tabular setting, we show that in contrast to poisoning attacks based on MDP primitives (transitions and (unbounded) rewards), which are always feasible, it is NP-hard to determine the feasibility of implicit poisoning attacks. We provide characterization results that establish sufficient conditions for the feasibility of the attack problem, as well as an upper and a lower bound on the optimal cost of the attack. We propose two algorithmic approaches for finding an optimal adversarial policy: a model-based approach with tabular policies and a model-free approach with parametric/neural policies. We showcase the efficacy of the proposed algorithms through experiments.
翻译:在目标性投毒攻击中,攻击者通过操纵智能体与环境的交互,迫使智能体采用称为目标策略的特定策略。已有工作主要聚焦于修改标准MDP基元(如奖励或转移)的攻击。本文研究双智能体场景下的目标性投毒攻击——攻击者通过修改同伴智能体的策略,隐式地投毒另一智能体的有效环境。我们构建了一个用于设计最优攻击的优化框架,其中攻击成本衡量该解偏离同伴智能体假定的默认策略的程度。进一步研究了该优化框架的计算性质。聚焦于表格型设定,我们证明:与基于MDP基元(转移和(无界)奖励)的总是可行的投毒攻击不同,隐式投毒攻击的可行性判定是NP难的。我们给出了刻画结果,建立了攻击问题可行性的充分条件,以及攻击最优成本的上界与下界。我们提出了两种寻找最优对抗策略的算法方法:基于表格策略的模型驱动方法和基于参数化/神经网络策略的无模型方法。通过实验验证了所提算法的有效性。