Backdoor attacks in reinforcement learning (RL) have previously employed intense attack strategies to ensure attack success. However, these methods suffer from high attack costs and increased detectability. In this work, we propose a novel approach, BadRL, which focuses on conducting highly sparse backdoor poisoning efforts during training and testing while maintaining successful attacks. Our algorithm, BadRL, strategically chooses state observations with high attack values to inject triggers during training and testing, thereby reducing the chances of detection. In contrast to the previous methods that utilize sample-agnostic trigger patterns, BadRL dynamically generates distinct trigger patterns based on targeted state observations, thereby enhancing its effectiveness. Theoretical analysis shows that the targeted backdoor attack is always viable and remains stealthy under specific assumptions. Empirical results on various classic RL tasks illustrate that BadRL can substantially degrade the performance of a victim agent with minimal poisoning efforts 0.003% of total training steps) during training and infrequent attacks during testing.
翻译:强化学习中的后门攻击以往采用强攻击策略以确保攻击成功,但这些方法存在攻击成本高、可检测性增强的问题。本文提出一种新方法BadRL,其核心是在训练与测试阶段实施高度稀疏的后门投毒,同时保持攻击有效性。BadRL算法策略性地选择具有高攻击价值的状态观测值,在训练和测试中注入触发器,从而降低被检测概率。不同于先前使用样本无关触发模式的方法,BadRL基于目标状态观测动态生成差异化触发模式,进而提升攻击效能。理论分析表明,在特定假设下,目标后门攻击始终可行且具备隐蔽性。在多种经典强化学习任务上的实验结果显示,BadRL能以极低投毒强度(训练总步数的0.003%即可实现攻击)显著降低受害智能体的性能,且在测试阶段攻击频率极低。