Offline Reinforcement Learning (RL) enables policy optimization from static datasets but is inherently vulnerable to backdoor attacks. Existing attack strategies typically struggle against safety-constrained algorithms (e.g., CQL) due to inefficient random poisoning and the use of easily detectable Out-of-Distribution (OOD) triggers. In this paper, we propose CS-GBA (Critical Sample-based Gradient-guided Backdoor Attack), a novel framework designed to achieve high stealthiness and destructiveness under a strict budget. Leveraging the theoretical insight that samples with high Temporal Difference (TD) errors are pivotal for value function convergence, we introduce an adaptive Critical Sample Selection strategy that concentrates the attack budget on the most influential transitions. To evade OOD detection, we propose a Correlation-Breaking Trigger mechanism that exploits the physical mutual exclusivity of state features (e.g., 95th percentile boundaries) to remain statistically concealed. Furthermore, we replace the conventional label inversion with a Gradient-Guided Action Generation mechanism, which searches for worst-case actions within the data manifold using the victim Q-network's gradient. Empirical results on D4RL benchmarks demonstrate that our method significantly outperforms state-of-the-art baselines, achieving high attack success rates against representative safety-constrained algorithms with a minimal 5% poisoning budget, while maintaining the agent's performance in clean environments.
翻译:离线强化学习(RL)能够从静态数据集中优化策略,但其本质上易受后门攻击。现有攻击策略通常因低效的随机投毒和使用易于检测的分布外(OOD)触发器,而在面对安全约束算法(如CQL)时效果不佳。本文提出CS-GBA(基于关键样本的梯度引导后门攻击),这是一种新颖的框架,旨在严格预算限制下实现高隐蔽性和高破坏性。利用高时序差分(TD)误差样本对价值函数收敛至关重要的理论洞见,我们引入了一种自适应关键样本选择策略,将攻击预算集中在最具影响力的状态转移上。为规避OOD检测,我们提出了一种相关性破坏触发器机制,该机制利用状态特征的物理互斥性(例如第95百分位数边界)以在统计上保持隐蔽。此外,我们使用梯度引导动作生成机制替代传统的标签反转,该机制利用受害者Q网络的梯度在数据流形内搜索最差情况动作。在D4RL基准测试上的实证结果表明,我们的方法显著优于现有先进基线,在仅5%的最小投毒预算下,对代表性安全约束算法实现了高攻击成功率,同时在干净环境中保持了智能体的性能。