Deep reinforcement learning (RL) is notoriously impractical to deploy due to sample inefficiency. Meta-RL directly addresses this sample inefficiency by learning to perform few-shot learning when a distribution of related tasks is available for meta-training. While many specialized meta-RL methods have been proposed, recent work suggests that end-to-end learning in conjunction with an off-the-shelf sequential model, such as a recurrent network, is a surprisingly strong baseline. However, such claims have been controversial due to limited supporting evidence, particularly in the face of prior work establishing precisely the opposite. In this paper, we conduct an empirical investigation. While we likewise find that a recurrent network can achieve strong performance, we demonstrate that the use of hypernetworks is crucial to maximizing their potential. Surprisingly, when combined with hypernetworks, the recurrent baselines that are far simpler than existing specialized methods actually achieve the strongest performance of all methods evaluated.
翻译:深度强化学习(RL)因样本效率低下而声名狼藉,实际部署极为困难。元强化学习通过在可用相关任务分布下进行元训练,学习执行小样本学习,直接解决了这一样本效率问题。尽管已有许多专门的元强化学习方法被提出,但近期研究表明,结合现成序列模型(如递归网络)的端到端学习是一个出奇强悍的基线。然而,由于支持证据有限(尤其是与以往得出完全相反结论的研究相悖),此类说法一直存在争议。本文通过实证研究展开调查。我们同样发现递归网络能实现强性能,但论证了使用超网络对于最大化其潜力至关重要。出人意料的是,当与超网络结合时,远比现有专门方法简单的递归基线在所有被评估方法中实现了最强的性能。