Deep Reinforcement Learning (DRL) is a subfield of machine learning for training autonomous agents that take sequential actions across complex environments. Despite its significant performance in well-known environments, it remains susceptible to minor condition variations, raising concerns about its reliability in real-world applications. To improve usability, DRL must demonstrate trustworthiness and robustness. A way to improve the robustness of DRL to unknown changes in the environmental conditions and possible perturbations is through Adversarial Training, by training the agent against well-suited adversarial attacks on the observations and the dynamics of the environment. Addressing this critical issue, our work presents an in-depth analysis of contemporary adversarial attack and training methodologies, systematically categorizing them and comparing their objectives and operational mechanisms.
翻译:深度强化学习(DRL)是机器学习的一个子领域,用于训练能够在复杂环境中执行序列动作的自主智能体。尽管其在已知环境中表现出卓越性能,DRL仍易受微小条件变化的影响,这引发了对其在实际应用中可靠性的担忧。为提升实用性,DRL必须具备可信性与鲁棒性。通过对抗训练——即让智能体在观测数据和环境动态中经受针对性对抗攻击的训练——能够有效提升DRL对环境条件未知变化及潜在扰动的鲁棒性。针对这一关键问题,本文对当前对抗攻击与训练方法展开深入分析,系统性地对其分类,并比较其目标与运行机制。