Continuous Control with Coarse-to-fine Reinforcement Learning

Despite recent advances in improving the sample-efficiency of reinforcement learning (RL) algorithms, designing an RL algorithm that can be practically deployed in real-world environments remains a challenge. In this paper, we present Coarse-to-fine Reinforcement Learning (CRL), a framework that trains RL agents to zoom-into a continuous action space in a coarse-to-fine manner, enabling the use of stable, sample-efficient value-based RL algorithms for fine-grained continuous control tasks. Our key idea is to train agents that output actions by iterating the procedure of (i) discretizing the continuous action space into multiple intervals and (ii) selecting the interval with the highest Q-value to further discretize at the next level. We then introduce a concrete, value-based algorithm within the CRL framework called Coarse-to-fine Q-Network (CQN). Our experiments demonstrate that CQN significantly outperforms RL and behavior cloning baselines on 20 sparsely-rewarded RLBench manipulation tasks with a modest number of environment interactions and expert demonstrations. We also show that CQN robustly learns to solve real-world manipulation tasks within a few minutes of online training.

翻译：尽管近年来在提升强化学习（RL）算法的样本效率方面取得了进展，设计一种能够实际部署于真实环境的RL算法仍然是一个挑战。本文提出粗到细强化学习（CRL）框架，该框架训练RL智能体以粗到细的方式逐步细化连续动作空间，从而能够将稳定、样本高效的价值型RL算法应用于细粒度连续控制任务。我们的核心思想是训练智能体通过迭代以下步骤输出动作：（i）将连续动作空间离散化为多个区间；（ii）选择具有最高Q值的区间，并在下一层级进一步离散化。随后，我们在CRL框架内提出了一种具体的价值型算法——粗到细Q网络（CQN）。实验表明，在20项稀疏奖励的RLBench操作任务上，CQN仅需适度的环境交互和专家演示，其性能显著优于RL与行为克隆基线方法。我们还证明，CQN能够在几分钟的在线训练内稳健地学习解决真实世界的操作任务。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日