IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control

Model-based reinforcement learning (RL) has shown great promise due to its sample efficiency, but still struggles with long-horizon sparse-reward tasks, especially in offline settings where the agent learns from a fixed dataset. We hypothesize that model-based RL agents struggle in these environments due to a lack of long-term planning capabilities, and that planning in a temporally abstract model of the environment can alleviate this issue. In this paper, we make two key contributions: 1) we introduce an offline model-based RL algorithm, IQL-TD-MPC, that extends the state-of-the-art Temporal Difference Learning for Model Predictive Control (TD-MPC) with Implicit Q-Learning (IQL); 2) we propose to use IQL-TD-MPC as a Manager in a hierarchical setting with any off-the-shelf offline RL algorithm as a Worker. More specifically, we pre-train a temporally abstract IQL-TD-MPC Manager to predict "intent embeddings", which roughly correspond to subgoals, via planning. We empirically show that augmenting state representations with intent embeddings generated by an IQL-TD-MPC manager significantly improves off-the-shelf offline RL agents' performance on some of the most challenging D4RL benchmark tasks. For instance, the offline RL algorithms AWAC, TD3-BC, DT, and CQL all get zero or near-zero normalized evaluation scores on the medium and large antmaze tasks, while our modification gives an average score over 40.

翻译：基于模型的强化学习因其样本效率而展现出巨大潜力，但在处理长时域稀疏奖励任务时仍存在困难，尤其是在智能体从固定数据集中学习的离线场景中。我们假设基于模型的强化学习智能体在这些环境中表现不佳是由于缺乏长期规划能力，而在环境的时域抽象模型中进行规划可以缓解这一问题。本文做出两项关键贡献：1）我们提出了一种离线基于模型的强化学习算法IQL-TD-MPC，它通过隐式Q学习扩展了最先进的模型预测控制时域差分学习；2）我们提议将IQL-TD-MPC作为分层设置中的管理器，与任何现成的离线强化学习算法（作为工作器）配合使用。具体而言，我们预训练了一个时域抽象的IQL-TD-MPC管理器，通过规划来预测大致对应于子目标的"意图嵌入"。实验表明，用IQL-TD-MPC管理器生成的意图嵌入增强状态表示，能显著提升现成离线强化学习智能体在最具挑战性的D4RL基准任务中的表现。例如，离线强化学习算法AWAC、TD3-BC、DT和CQL在中等和大型antmaze任务上的归一化评估分数均为零或接近零，而我们的改进方法使平均分数超过40。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日