Tasks where the set of possible actions depend discontinuously on the state pose a significant challenge for current reinforcement learning algorithms. For example, a locked door must be first unlocked, and then the handle turned before the door can be opened. The sequential nature of these tasks makes obtaining final rewards difficult, and transferring information between task variants using continuous learned values such as weights rather than discrete symbols can be inefficient. Our key insight is that agents that act and think symbolically are often more effective in dealing with these tasks. We propose a memory-based learning approach that leverages the symbolic nature of constraints and temporal ordering of actions in these tasks to quickly acquire and transfer high-level information. We evaluate the performance of memory-based learning on both real and simulated tasks with approximately discontinuous constraints between states and actions, and show our method learns to solve these tasks an order of magnitude faster than both model-based and model-free deep reinforcement learning methods.
翻译:当前强化学习算法在应对一组可能动作随状态不连续变化的任务时面临显著挑战。例如,必须先解锁一扇锁着的门,然后转动门把手,门才能被打开。这些任务的顺序性使得获取最终奖励变得困难,且使用连续的习得值(如权重)而非离散符号在任务变体间传递信息可能效率低下。我们的核心洞见是:以符号化方式行动和思考的智能体在应对这些任务时往往更为有效。我们提出了一种基于记忆的学习方法,该方法利用这些任务中约束的符号性质以及动作的时间顺序,快速获取并传递高层信息。我们在真实任务与模拟任务上评估了基于记忆的学习性能,这些任务中状态与动作之间存在近似不连续的约束。结果表明,我们的方法在解决这些任务时的学习速度比基于模型和无模型的深度强化学习方法快一个数量级。