Tasks with large state space and sparse rewards present a longstanding challenge to reinforcement learning. In these tasks, an agent needs to explore the state space efficiently until it finds a reward. To deal with this problem, the community has proposed to augment the reward function with intrinsic reward, a bonus signal that encourages the agent to visit interesting states. In this work, we propose a new way of defining interesting states for environments with factored state spaces and complex chained dependencies, where an agent's actions may change the value of one entity that, in order, may affect the value of another entity. Our insight is that, in these environments, interesting states for exploration are states where the agent is uncertain whether (as opposed to how) entities such as the agent or objects have some influence on each other. We present ELDEN, Exploration via Local DepENdencies, a novel intrinsic reward that encourages the discovery of new interactions between entities. ELDEN utilizes a novel scheme -- the partial derivative of the learned dynamics to model the local dependencies between entities accurately and computationally efficiently. The uncertainty of the predicted dependencies is then used as an intrinsic reward to encourage exploration toward new interactions. We evaluate the performance of ELDEN on four different domains with complex dependencies, ranging from 2D grid worlds to 3D robotic tasks. In all domains, ELDEN correctly identifies local dependencies and learns successful policies, significantly outperforming previous state-of-the-art exploration methods.
翻译:具有大规模状态空间和稀疏奖励的任务对强化学习提出了长期挑战。在这些任务中,智能体需要高效探索状态空间直到找到奖励。为解决该问题,学术界提出了通过内在奖励增强奖励函数的方法,即一种鼓励智能体访问有趣状态的奖励信号。本研究针对具有因子化状态空间和复杂链式依赖的环境提出了一种新的有趣状态定义方式,在此类环境中,智能体的动作可能改变某个实体的值,该实体继而可能影响另一实体的值。我们的核心见解是:在这些环境中,适合探索的有趣状态是指智能体不确定实体(如智能体自身或物体)之间是否存在相互影响(而非影响程度)的状态。我们提出ELDEN(通过局部依赖进行探索),这是一种新颖的内在奖励机制,旨在鼓励发现实体间的新交互。ELDEN采用一种创新方案——利用学习动力学的偏导数精确且计算高效地建模实体间的局部依赖关系。随后,将预测依赖关系的不确定性作为内在奖励,以鼓励向新交互方向探索。我们在四个具有复杂依赖关系的不同领域(从二维网格世界到三维机器人任务)中评估了ELDEN的性能。在所有领域中,ELDEN均能正确识别局部依赖关系并学习到成功策略,显著超越了先前最先进的探索方法。