Automaton-Guided Curriculum Generation for Reinforcement Learning Agents

Despite advances in Reinforcement Learning, many sequential decision making tasks remain prohibitively expensive and impractical to learn. Recently, approaches that automatically generate reward functions from logical task specifications have been proposed to mitigate this issue; however, they scale poorly on long-horizon tasks (i.e., tasks where the agent needs to perform a series of correct actions to reach the goal state, considering future transitions while choosing an action). Employing a curriculum (a sequence of increasingly complex tasks) further improves the learning speed of the agent by sequencing intermediate tasks suited to the learning capacity of the agent. However, generating curricula from the logical specification still remains an unsolved problem. To this end, we propose AGCL, Automaton-guided Curriculum Learning, a novel method for automatically generating curricula for the target task in the form of Directed Acyclic Graphs (DAGs). AGCL encodes the specification in the form of a deterministic finite automaton (DFA), and then uses the DFA along with the Object-Oriented MDP (OOMDP) representation to generate a curriculum as a DAG, where the vertices correspond to tasks, and edges correspond to the direction of knowledge transfer. Experiments in gridworld and physics-based simulated robotics domains show that the curricula produced by AGCL achieve improved time-to-threshold performance on a complex sequential decision-making problem relative to state-of-the-art curriculum learning (e.g, teacher-student, self-play) and automaton-guided reinforcement learning baselines (e.g, Q-Learning for Reward Machines). Further, we demonstrate that AGCL performs well even in the presence of noise in the task's OOMDP description, and also when distractor objects are present that are not modeled in the logical specification of the tasks' objectives.

翻译：尽管强化学习取得了进展，许多序列决策任务仍然因成本过高且不切实际而难以学习。近期，有研究提出从逻辑任务规范中自动生成奖励函数的方法来缓解这一问题，然而这些方法在长时域任务（即智能体需在行动选择时考虑未来状态转移、通过一系列正确操作到达目标状态的任务）中扩展性较差。采用课程（即复杂度递增的任务序列）通过安排适合智能体学习能力的中间任务，可进一步加速其学习过程，但如何从逻辑规范中自动生成课程仍是一个未解决的问题。为此，我们提出AGCL（自动机引导的课程学习）——一种以有向无环图（DAG）形式为目标任务自动生成课程的新方法。AGCL将任务规范编码为确定性有限自动机（DFA），并利用DFA与面向对象MDP（OOMDP）表征，生成以任务为顶点、知识迁移方向为边的课程DAG。在网格世界与基于物理的仿真机器人领域实验中，与现有最优的课程学习方法（如师生学习、自我博弈）及自动机引导的强化学习基线（如奖励机器的Q学习）相比，AGCL生成的课程在复杂序列决策问题的收敛时间性能上表现更优。此外，我们证明即使在任务OOMDP描述存在噪声，或存在未建模于任务目标逻辑规范中的干扰物体时，AGCL仍能保持良好性能。

相关内容

课程

关注 6

课程是指学校学生所应学习的学科总和及其进程与安排。课程是对教育的目标、教学内容、教学活动方式的规划和设计，是教学计划、教学大纲等诸多方面实施过程的总和。广义的课程是指学校为实现培养目标而选择的教育内容及其进程的总和，它包括学校老师所教授的各门学科和有目的、有计划的教育活动。狭义的课程是指某一门学科。专知上对国内外最新AI+X的课程进行了收集与索引，涵盖斯坦福大学、CMU、MIT、清华、北大等名校开放课程。

JCIM丨DRlinker：深度强化学习优化片段连接设计

专知会员服务

7+阅读 · 2022年12月9日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

24+阅读 · 2022年3月19日

【布朗大学David Abel博士论文】A Theory of Abstraction in Reinforcement Learning

专知会员服务

25+阅读 · 2022年3月16日

斯坦福大学最新【强化学习】2022课程，含ppt

专知会员服务

134+阅读 · 2022年2月27日