Inspired by human conscious planning, we propose Skipper, a model-based reinforcement learning agent utilizing spatio-temporal abstractions to generalize learned skills in novel situations. It automatically decomposes the given task into smaller, more manageable subtasks, and hence enables sparse decision-making and focused computation on the relevant parts of the environment. This relies on the extraction of an abstracted proxy problem represented as a directed graph, in which vertices and edges are learned end-to-end from hindsight. Our theoretical analyses provide performance guarantees under appropriate assumptions and establish where our approach is expected to be helpful. Generalization-focused experiments validate Skipper's significant advantage in zero-shot generalization, compared to existing state-of-the-art hierarchical planning methods.
翻译:受人类有意识规划的启发,我们提出Skipper——一种利用时空抽象在陌生情境中泛化所学技能的基于模型的强化学习智能体。它能自动将给定任务分解为更小、更易管理的子任务,从而支持稀疏决策制定,并将计算聚焦于环境中的相关部分。这依赖于提取一个以有向图表示的抽象代理问题,其中顶点和边均通过事后经验进行端到端学习。我们的理论分析在适当假设下提供了性能保证,并明确了该方法的预期适用场景。以泛化为重点的实验证明,与现有最先进的分层规划方法相比,Skipper在零样本泛化方面具有显著优势。