Deep reinforcement learning (RL) is notoriously impractical to deploy due to sample inefficiency. Meta-RL directly addresses this sample inefficiency by learning to perform few-shot learning when a distribution of related tasks is available for meta-training. While many specialized meta-RL methods have been proposed, recent work suggests that end-to-end learning in conjunction with an off-the-shelf sequential model, such as a recurrent network, is a surprisingly strong baseline. However, such claims have been controversial due to limited supporting evidence, particularly in the face of prior work establishing precisely the opposite. In this paper, we conduct an empirical investigation. While we likewise find that a recurrent network can achieve strong performance, we demonstrate that the use of hypernetworks is crucial to maximizing their potential. Surprisingly, when combined with hypernetworks, the recurrent baselines that are far simpler than existing specialized methods actually achieve the strongest performance of all methods evaluated.
翻译:深度强化学习(RL)因样本效率低下而臭名昭著,难以实际部署。元强化学习通过在有相关任务分布可供元训练时学习执行小样本学习,直接解决了这一样本效率问题。尽管已有许多专门的元强化学习方法被提出,但近期研究表明,结合现成序列模型(如递归网络)的端到端学习是一种出人意料的强大基线方法。然而,由于支持证据有限——尤其与先前明确得出相反结论的研究相悖——此类观点一直存在争议。本文中,我们开展了实证研究。虽然同样发现递归网络能实现强大性能,但我们证明了使用超网络对其最大化潜力至关重要。令人惊讶的是,当与超网络结合时,那些比现有专门方法简单得多的递归基线方法,在所有评估方法中实际上取得了最优性能。