Artificial agents' adaptability to novelty and alignment with intended behavior is crucial for their effective deployment. Reinforcement learning (RL) leverages novelty as a means of exploration, yet agents often struggle to handle novel situations, hindering generalization. To address these issues, we propose HackAtari, a framework introducing controlled novelty to the most common RL benchmark, the Atari Learning Environment. HackAtari allows us to create novel game scenarios (including simplification for curriculum learning), to swap the game elements' colors, as well as to introduce different reward signals for the agent. We demonstrate that current agents trained on the original environments include robustness failures, and evaluate HackAtari's efficacy in enhancing RL agents' robustness and aligning behavior through experiments using C51 and PPO. Overall, HackAtari can be used to improve the robustness of current and future RL algorithms, allowing Neuro-Symbolic RL, curriculum RL, causal RL, as well as LLM-driven RL. Our work underscores the significance of developing interpretable in RL agents.
翻译:人工智能体对新颖性的适应能力以及与预期行为的对齐程度,对其有效部署至关重要。强化学习虽将新颖性作为探索手段,但智能体常难以应对新情境,阻碍了泛化能力。为解决这些问题,我们提出HackAtari框架——该框架向最常用的RL基准测试平台雅达利学习环境引入可控的新颖性。HackAtari允许创建新型游戏场景(包括面向课程学习的简化版本)、交换游戏元素颜色,以及为智能体引入不同奖励信号。我们证明,在原始环境中训练的现有智能体存在鲁棒性缺陷,并通过C51与PPO算法的实验评估了HackAtari在增强RL智能体鲁棒性及行为对齐方面的有效性。总体而言,HackAtari可用于提升当前及未来RL算法的鲁棒性,支持神经符号RL、课程RL、因果RL及大语言模型驱动的RL。本工作强调了开发可解释RL智能体的重要意义。