A key challenge in lifelong reinforcement learning (RL) is the loss of plasticity, where previous learning progress hinders an agent's adaptation to new tasks. While regularization and resetting can help, they require precise hyperparameter selection at the outset and environment-dependent adjustments. Building on the principled theory of online convex optimization, we present a parameter-free optimizer for lifelong RL, called TRAC, which requires no tuning or prior knowledge about the distribution shifts. Extensive experiments on Procgen, Atari, and Gym Control environments show that TRAC works surprisingly well-mitigating loss of plasticity and rapidly adapting to challenging distribution shifts-despite the underlying optimization problem being nonconvex and nonstationary.
翻译:终身强化学习(RL)中的一个关键挑战是塑性丧失,即先前的学习进展阻碍智能体适应新任务。虽然正则化和重置方法能够提供帮助,但它们需要在初始阶段精确选择超参数,并根据环境进行调整。基于在线凸优化的原理性理论,我们提出了一种用于终身RL的无参数优化器,称为TRAC,它无需调整或关于分布偏移的先验知识。在Procgen、Atari和Gym Control环境上进行的大量实验表明,TRAC表现出色——尽管底层优化问题是非凸且非平稳的,它仍能有效缓解塑性丧失,并快速适应具有挑战性的分布偏移。