Learning Temporally Extended Skills in Continuous Domains as Symbolic Actions for Planning

from arxiv, Project website (including video) is available at https://seads.is.tue.mpg.de/. (v2) Accepted for publication at the 6th Conference on Robot Learning (CoRL) 2022, Auckland, New Zealand. (v3) Added details on checkpointing (S.8.1), with references on p.7, p.8, p.21 to clarify number of env. steps of reported results

Problems which require both long-horizon planning and continuous control capabilities pose significant challenges to existing reinforcement learning agents. In this paper we introduce a novel hierarchical reinforcement learning agent which links temporally extended skills for continuous control with a forward model in a symbolic discrete abstraction of the environment's state for planning. We term our agent SEADS for Symbolic Effect-Aware Diverse Skills. We formulate an objective and corresponding algorithm which leads to unsupervised learning of a diverse set of skills through intrinsic motivation given a known state abstraction. The skills are jointly learned with the symbolic forward model which captures the effect of skill execution in the state abstraction. After training, we can leverage the skills as symbolic actions using the forward model for long-horizon planning and subsequently execute the plan using the learned continuous-action control skills. The proposed algorithm learns skills and forward models that can be used to solve complex tasks which require both continuous control and long-horizon planning capabilities with high success rate. It compares favorably with other flat and hierarchical reinforcement learning baseline agents and is successfully demonstrated with a real robot.

翻译：同时需要长时域规划与连续控制能力的问题对现有强化学习智能体构成了重大挑战。本文提出了一种新型分层强化学习智能体，它将用于连续控制的时域扩展技能与环境状态符号离散抽象中的前向模型相结合，以实现规划。我们将该智能体命名为SEADS（即符号效应感知多样化技能）。我们提出了一个目标函数及其对应算法，通过已知状态抽象下的内在动机实现多样化技能的无监督学习。这些技能与符号前向模型联合学习，该模型捕获了技能执行在状态抽象中的效应。训练完成后，我们可将这些技能作为符号动作，利用前向模型进行长时域规划，随后使用所学的连续动作控制技能执行该规划。所提出的算法能够学习可用于解决复杂任务的技能和前向模型，这些任务需要同时具备连续控制与长时域规划能力，且成功率较高。该方法与其他平面及分层强化学习基线智能体相比表现更优，并已在实际机器人上成功验证。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日