A key challenge in lifelong reinforcement learning (RL) is the loss of plasticity, where previous learning progress hinders an agent's adaptation to new tasks. While regularization and resetting can help, they require precise hyperparameter selection at the outset and environment-dependent adjustments. Building on the principled theory of online convex optimization, we present a parameter-free optimizer for lifelong RL, called TRAC, which requires no tuning or prior knowledge about the distribution shifts. Extensive experiments on Procgen, Atari, and Gym Control environments show that TRAC works surprisingly well-mitigating loss of plasticity and rapidly adapting to challenging distribution shifts-despite the underlying optimization problem being nonconvex and nonstationary.
翻译:终身强化学习(RL)中的一个关键挑战是塑性丧失,即先前的学习进展阻碍了智能体对新任务的适应。虽然正则化和重置方法有所帮助,但它们需要在初始阶段精确选择超参数,并依赖环境进行调整。基于在线凸优化的原理性理论,我们提出了一种用于终身RL的无参数优化器,称为TRAC,它无需调参或对分布偏移的先验知识。在Procgen、Atari和Gym Control环境上的大量实验表明,尽管底层优化问题是非凸且非平稳的,TRAC仍表现出色——有效缓解塑性丧失,并能快速适应具有挑战性的分布偏移。