Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning

Exploration in sparse-reward reinforcement learning is difficult due to the requirement of long, coordinated sequences of actions in order to achieve any reward. Moreover, in continuous action spaces there are an infinite number of possible actions, which only increases the difficulty of exploration. One class of methods designed to address these issues forms temporally extended actions, often called skills, from interaction data collected in the same domain, and optimizes a policy on top of this new action space. Typically such methods require a lengthy pretraining phase, especially in continuous action spaces, in order to form the skills before reinforcement learning can begin. Given prior evidence that the full range of the continuous action space is not required in such tasks, we propose a novel approach to skill-generation with two components. First we discretize the action space through clustering, and second we leverage a tokenization technique borrowed from natural language processing to generate temporally extended actions. Such a method outperforms baselines for skill-generation in several challenging sparse-reward domains, and requires orders-of-magnitude less computation in skill-generation and online rollouts.

翻译：稀疏奖励强化学习中的探索因需要长序列协调动作才能获得任何奖励而面临困难。此外，在连续动作空间中存在无限可能的动作，这进一步增加了探索难度。为解决这些问题，一类方法通过从同一领域收集的交互数据中构建时间扩展动作（通常称为技能），并在此新动作空间上优化策略。此类方法通常需要漫长的预训练阶段（尤其在连续动作空间中）才能构建技能，随后才能开始强化学习。基于现有证据表明此类任务无需使用完整连续动作空间，我们提出了一种包含两个组件的新型技能生成方法：首先通过聚类离散化动作空间，其次借鉴自然语言处理中的分词技术生成时间扩展动作。该方法在多个具有挑战性的稀疏奖励领域中优于基线技能生成方法，且在技能生成和在线交互中的计算量减少数个数量级。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日