In recent years, by leveraging more data, computation, and diverse tasks, learned optimizers have achieved remarkable success in supervised learning optimization, outperforming classical hand-designed optimizers. However, in practice, these learned optimizers fail to generalize to reinforcement learning tasks due to unstable and complex loss landscapes. Moreover, neither hand-designed optimizers nor learned optimizers have been specifically designed to address the unique optimization properties in reinforcement learning. In this work, we take a data-driven approach to learn to optimize for reinforcement learning using meta-learning. We introduce a novel optimizer structure that significantly improves the training efficiency of learned optimizers, making it possible to learn an optimizer for reinforcement learning from scratch. Although trained in toy tasks, our learned optimizer demonstrates its generalization ability to unseen complex tasks. Finally, we design a set of small gridworlds to train the first general-purpose optimizer for reinforcement learning.
翻译:近年来,通过利用更多数据、计算资源及多样化任务,学习型优化器在监督学习优化领域取得了显著成功,其性能超越了经典的手工设计优化器。然而,在实际应用中,由于强化学习任务中损失景观的不稳定性与复杂性,这些学习型优化器无法泛化至此类任务。此外,无论是手工设计优化器还是学习型优化器,均未专门针对强化学习中的独特优化特性进行设计。本研究采用数据驱动方法,通过元学习为强化学习任务学习优化器。我们提出了一种新型优化器结构,显著提升了学习型优化器的训练效率,使得从零开始为强化学习训练优化器成为可能。尽管仅在玩具级任务上训练,但该学习型优化器展现了向未见复杂任务的泛化能力。最终,我们设计了一组小型网格世界环境,用于训练首个面向强化学习的通用优化器。