Modern reinforcement learning (RL) struggles to capture real-world cause-and-effect dynamics, leading to inefficient exploration due to extensive trial-and-error actions. While recent efforts to improve agent exploration have leveraged causal discovery, they often make unrealistic assumptions of causal variables in the environments. In this paper, we introduce a novel framework, Variable-Agnostic Causal Exploration for Reinforcement Learning (VACERL), incorporating causal relationships to drive exploration in RL without specifying environmental causal variables. Our approach automatically identifies crucial observation-action steps associated with key variables using attention mechanisms. Subsequently, it constructs the causal graph connecting these steps, which guides the agent towards observation-action pairs with greater causal influence on task completion. This can be leveraged to generate intrinsic rewards or establish a hierarchy of subgoals to enhance exploration efficiency. Experimental results showcase a significant improvement in agent performance in grid-world, 2d games and robotic domains, particularly in scenarios with sparse rewards and noisy actions, such as the notorious Noisy-TV environments.
翻译:现代强化学习(RL)难以捕捉现实世界中的因果关系动态,导致因大量试错行为而产生低效探索。尽管近期改进智能体探索的研究利用了因果发现方法,但它们通常对环境中的因果变量做出不切实际的假设。本文提出了一种新颖的框架——变量无关的强化学习因果探索(VACERL),该框架在不指定环境因果变量的前提下,通过整合因果关系来驱动强化学习中的探索。我们的方法利用注意力机制自动识别与关键变量相关的重要观测-行动步骤,进而构建连接这些步骤的因果图,从而引导智能体趋向对任务完成具有更强因果影响的观测-行动对。该机制可用于生成内在奖励或建立子目标层次结构,以提升探索效率。实验结果表明,在网格世界、二维游戏及机器人领域,特别是在稀疏奖励和噪声动作的场景(如著名的Noisy-TV环境)中,智能体性能均获得显著提升。