CAMP in the Odyssey: Provably Robust Reinforcement Learning with Certified Radius Maximization

from arxiv, Accepted to USENIX Security Symposium 2025, Seattle, WA, USA. Source code is available at Github (https://github.com/NeuralSec/camp-robust-rl) and Zenodo (https://zenodo.org/records/14729675)

Deep reinforcement learning (DRL) has gained widespread adoption in control and decision-making tasks due to its strong performance in dynamic environments. However, DRL agents are vulnerable to noisy observations and adversarial attacks, and concerns about the adversarial robustness of DRL systems have emerged. Recent efforts have focused on addressing these robustness issues by establishing rigorous theoretical guarantees for the returns achieved by DRL agents in adversarial settings. Among these approaches, policy smoothing has proven to be an effective and scalable method for certifying the robustness of DRL agents. Nevertheless, existing certifiably robust DRL relies on policies trained with simple Gaussian augmentations, resulting in a suboptimal trade-off between certified robustness and certified return. To address this issue, we introduce a novel paradigm dubbed \texttt{C}ertified-r\texttt{A}dius-\texttt{M}aximizing \texttt{P}olicy (\texttt{CAMP}) training. \texttt{CAMP} is designed to enhance DRL policies, achieving better utility without compromising provable robustness. By leveraging the insight that the global certified radius can be derived from local certified radii based on training-time statistics, \texttt{CAMP} formulates a surrogate loss related to the local certified radius and optimizes the policy guided by this surrogate loss. We also introduce \textit{policy imitation} as a novel technique to stabilize \texttt{CAMP} training. Experimental results demonstrate that \texttt{CAMP} significantly improves the robustness-return trade-off across various tasks. Based on the results, \texttt{CAMP} can achieve up to twice the certified expected return compared to that of baselines. Our code is available at https://github.com/NeuralSec/camp-robust-rl.

翻译：深度强化学习（DRL）因其在动态环境中的强大性能，已在控制和决策任务中得到广泛应用。然而，DRL智能体易受噪声观测和对抗性攻击的影响，DRL系统的对抗鲁棒性问题日益受到关注。近期研究致力于通过在对抗性环境中为DRL智能体所获回报建立严格的理论保证来解决这些鲁棒性问题。在这些方法中，策略平滑已被证明是一种有效且可扩展的认证DRL智能体鲁棒性的方法。然而，现有可认证鲁棒的DRL依赖于通过简单高斯增强训练的策略，导致认证鲁棒性与认证回报之间存在次优权衡。为解决这一问题，我们提出了一种名为\texttt{C}ertified-r\texttt{A}dius-\texttt{M}aximizing \texttt{P}olicy（\texttt{CAMP}）训练的新范式。\texttt{CAMP}旨在增强DRL策略，在不牺牲可证明鲁棒性的前提下获得更好的效用。通过利用全局认证半径可从基于训练时统计的局部认证半径推导出的洞见，\texttt{CAMP}构建了一个与局部认证半径相关的代理损失函数，并以此代理损失为指导优化策略。我们还引入了\textit{策略模仿}作为稳定\texttt{CAMP}训练的新技术。实验结果表明，\texttt{CAMP}在各种任务中显著改善了鲁棒性与回报的权衡关系。基于实验结果，\texttt{CAMP}可实现高达基线方法两倍的认证期望回报。我们的代码公开于https://github.com/NeuralSec/camp-robust-rl。