Reinforcement learning and classical planning are typically seen as two distinct problems, with differing formulations necessitating different solutions. Yet, when humans are given a task, regardless of the way it is specified, they can often derive the additional information needed to solve the problem efficiently. The key to this ability is introspection: by reasoning about their internal models of the problem, humans directly synthesize additional task-relevant information. In this paper, we propose that this introspection can be thought of as program analysis. We discuss examples of how this approach can be applied to various kinds of models used in reinforcement learning. We then describe an algorithm that enables efficient goal-oriented planning over the class of models used in relational reinforcement learning, demonstrating a novel link between reinforcement learning and classical planning.
翻译:强化学习与经典规划通常被视为两个不同的问题,其差异化的形式化表述需要不同的解决方案。然而,当人类被赋予一项任务时,无论其描述方式如何,他们通常能够推导出有效解决问题所需的额外信息。这种能力的关键在于内省:通过推理其内部问题模型,人类直接合成出与任务相关的额外信息。在本文中,我们提出这种内省过程可被视为程序分析。我们讨论了该方法如何应用于强化学习中使用的各类模型,并举例说明。随后,我们描述了一种算法,该算法能够在关系强化学习所使用的模型类别上实现高效的目标导向规划,从而揭示了强化学习与经典规划之间的一种新颖联系。