Hierarchical Continual Reinforcement Learning via Large Language Model

The ability to learn continuously in dynamic environments is a crucial requirement for reinforcement learning (RL) agents applying in the real world. Despite the progress in continual reinforcement learning (CRL), existing methods often suffer from insufficient knowledge transfer, particularly when the tasks are diverse. To address this challenge, we propose a new framework, Hierarchical Continual reinforcement learning via large language model (Hi-Core), designed to facilitate the transfer of high-level knowledge. Hi-Core orchestrates a twolayer structure: high-level policy formulation by a large language model (LLM), which represents agenerates a sequence of goals, and low-level policy learning that closely aligns with goal-oriented RL practices, producing the agent's actions in response to the goals set forth. The framework employs feedback to iteratively adjust and verify highlevel policies, storing them along with low-level policies within a skill library. When encountering a new task, Hi-Core retrieves relevant experience from this library to help to learning. Through experiments on Minigrid, Hi-Core has demonstrated its effectiveness in handling diverse CRL tasks, which outperforms popular baselines.

翻译：在动态环境中持续学习的能力是强化学习智能体应用于现实世界的关键要求。尽管持续强化学习已取得进展，但现有方法在任务多样化时往往存在知识迁移不足的问题。为解决这一挑战，我们提出了一种新框架——基于大语言模型的层次化持续强化学习（Hi-Core），旨在促进高层知识的迁移。Hi-Core采用双层结构：高层策略由大语言模型制定，生成一系列目标序列；低层策略则与目标导向的强化学习实践紧密结合，根据设定的目标产生智能体的动作。该框架通过反馈机制迭代调整和验证高层策略，并将其与低层策略一同存储在技能库中。当遇到新任务时，Hi-Core从该库中检索相关经验以辅助学习。在Minigrid平台上的实验表明，Hi-Core在处理多样化持续强化学习任务方面表现优异，超越了主流基线方法。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日