Adversarial robustness in RL is difficult because perturbations affect entire trajectories: strong attacks can break learning, while weak attacks yield little robustness, and the appropriate strength varies by state. We propose $α$-reward-preserving attacks, which adapt the strength of the adversary so that an $α$ fraction of the nominal-to-worst-case return gap remains achievable at each state. In deep RL, we use a gradient-based attack direction and learn a state-dependent magnitude $η\le η_{\mathcal B}$ selected via a critic $Q^π_α((s,a),η)$ trained off-policy over diverse radii. This adaptive tuning calibrates attack strength and, with intermediate $α$, improves robustness across radii while preserving nominal performance, outperforming fixed- and random-radius baselines.
翻译:强化学习中的对抗鲁棒性研究面临挑战,因为扰动会影响整个轨迹:过强的攻击会破坏学习过程,而过弱的攻击则难以提升鲁棒性,且最佳攻击强度随状态变化。本文提出α-奖励保持攻击方法,通过动态调整对抗攻击强度,使得在每个状态下仍能保持名义回报与最坏情况回报差距的α比例可达。在深度强化学习中,我们采用基于梯度的攻击方向,并通过离线训练的评判器$Q^π_α((s,a),η)$学习状态相关的攻击幅度$η\le η_{\mathcal B}$。这种自适应调节机制能够校准攻击强度,在中等α值设置下,可在保持名义性能的同时提升多半径范围内的鲁棒性,其效果优于固定半径和随机半径的基线方法。