Intelligent agents should have the ability to leverage knowledge from previously learned tasks in order to learn new ones quickly and efficiently. Meta-learning approaches have emerged as a popular solution to achieve this. However, meta-reinforcement learning (meta-RL) algorithms have thus far been restricted to simple environments with narrow task distributions. Moreover, the paradigm of pretraining followed by fine-tuning to adapt to new tasks has emerged as a simple yet effective solution in supervised and self-supervised learning. This calls into question the benefits of meta-learning approaches also in reinforcement learning, which typically come at the cost of high complexity. We hence investigate meta-RL approaches in a variety of vision-based benchmarks, including Procgen, RLBench, and Atari, where evaluations are made on completely novel tasks. Our findings show that when meta-learning approaches are evaluated on different tasks (rather than different variations of the same task), multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation. This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL. From these findings, we advocate for evaluating future meta-RL methods on more challenging tasks and including multi-task pretraining with fine-tuning as a simple, yet strong baseline.
翻译:智能体应具备利用先前学得任务的知识快速高效学习新任务的能力。元学习方法已成为实现这一目标的流行解决方案。然而,元强化学习算法目前仍局限于任务分布狭窄的简单环境。与此同时,在监督学习和自监督学习中,预训练后微调以适应新任务的范式已成为简单而有效的方案。这引发了关于元学习方法在强化学习中是否同样具备优势的质疑——这些方法通常伴随较高复杂度。因此,我们在包括Procgen、RLBench和Atari在内的多种基于视觉的基准测试中探究元强化学习方法,评估其在完全新任务上的表现。研究发现:当元学习方法在不同任务(而非同一任务的不同变体)上进行评估时,多任务预训练加新任务微调的表现与元预训练加元测试时自适应相当或更优。这为未来研究带来鼓舞,因为多任务预训练通常比元强化学习更简单且计算成本更低。基于这些发现,我们主张未来元强化学习方法应在更具挑战性的任务上进行评估,并将多任务预训练加微调作为简单而强大的基准方法。