Comparing the Efficacy of Fine-Tuning and Meta-Learning for Few-Shot Policy Imitation

In this paper we explore few-shot imitation learning for control problems, which involves learning to imitate a target policy by accessing a limited set of offline rollouts. This setting has been relatively under-explored despite its relevance to robotics and control applications. State-of-the-art methods developed to tackle few-shot imitation rely on meta-learning, which is expensive to train as it requires access to a distribution over tasks (rollouts from many target policies and variations of the base environment). Given this limitation we investigate an alternative approach, fine-tuning, a family of methods that pretrain on a single dataset and then fine-tune on unseen domain-specific data. Recent work has shown that fine-tuners outperform meta-learners in few-shot image classification tasks, especially when the data is out-of-domain. Here we evaluate to what extent this is true for control problems, proposing a simple yet effective baseline which relies on two stages: (i) training a base policy online via reinforcement learning (e.g. Soft Actor-Critic) on a single base environment, (ii) fine-tuning the base policy via behavioral cloning on a few offline rollouts of the target policy. Despite its simplicity this baseline is competitive with meta-learning methods on a variety of conditions and is able to imitate target policies trained on unseen variations of the original environment. Importantly, the proposed approach is practical and easy to implement, as it does not need any complex meta-training protocol. As a further contribution, we release an open source dataset called iMuJoCo (iMitation MuJoCo) consisting of 154 variants of popular OpenAI-Gym MuJoCo environments with associated pretrained target policies and rollouts, which can be used by the community to study few-shot imitation learning and offline reinforcement learning.

翻译：本文探索控制问题的少样本模仿学习，即通过访问有限的离线回放数据集来学习模仿目标策略。尽管该设定与机器人及控制应用高度相关，相关研究仍相对不足。现有解决少样本模仿的先进方法依赖于元学习，但元学习训练成本高昂，因为它需要访问任务分布（即来自多个目标策略的回放数据及基础环境的变体）。鉴于这一局限，我们研究另一种方法——微调，即先在单一数据集上预训练，再针对未见过的领域特定数据进行微调。近期研究表明，在少样本图像分类任务中，微调方法优于元学习方法，尤其当数据来自领域外时。本文评估这一结论在控制问题中的适用范围，提出一种简单而有效的基线方法，包含两个阶段：（i）在单一基础环境中通过强化学习（如Soft Actor-Critic）在线训练基础策略；（ii）利用目标策略的少量离线回放数据，通过行为克隆微调基础策略。尽管方法简单，该基线在多种条件下与元学习方法性能相当，并能模仿在原始环境未知变体上训练的目标策略。重要的是，所提方法实用且易于实现，无需复杂的元训练协议。作为额外贡献，我们发布名为iMuJoCo（iMitation MuJoCo）的开源数据集，包含154种流行的OpenAI-Gym MuJoCo环境变体及其预训练目标策略与回放数据，供社区用于研究少样本模仿学习与离线强化学习。

相关内容

小样本学习

关注 216

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日