Plasticity, the ability of a neural network to evolve with new data, is crucial for high-performance and sample-efficient visual reinforcement learning (VRL). Although methods like resetting and regularization can potentially mitigate plasticity loss, the influences of various components within the VRL framework on the agent's plasticity are still poorly understood. In this work, we conduct a systematic empirical exploration focusing on three primary underexplored facets and derive the following insightful conclusions: (1) data augmentation is essential in maintaining plasticity; (2) the critic's plasticity loss serves as the principal bottleneck impeding efficient training; and (3) without timely intervention to recover critic's plasticity in the early stages, its loss becomes catastrophic. These insights suggest a novel strategy to address the high replay ratio (RR) dilemma, where exacerbated plasticity loss hinders the potential improvements of sample efficiency brought by increased reuse frequency. Rather than setting a static RR for the entire training process, we propose Adaptive RR, which dynamically adjusts the RR based on the critic's plasticity level. Extensive evaluations indicate that Adaptive RR not only avoids catastrophic plasticity loss in the early stages but also benefits from more frequent reuse in later phases, resulting in superior sample efficiency.
翻译:可塑性,即神经网络随新数据演化的能力,对于高性能和高样本效率的视觉强化学习至关重要。尽管重置和正则化等方法可能缓解可塑性损失,但视觉强化学习框架中各种组件对智能体可塑性的影响仍知之甚少。本研究针对三个尚未充分探索的主要方面进行系统性实证探究,得出以下深刻结论:(1) 数据增强对维持可塑性至关重要;(2) 评论家的可塑性损失是阻碍高效训练的主要瓶颈;(3) 若在早期阶段未能及时干预以恢复评论家可塑性,其损失将演变为灾难性。这些见解为解决高回放比率困境提供了新策略——该困境中加剧的可塑性损失抑制了增加重用频率带来的样本效率提升潜力。我们未采用整个训练过程设置静态RR的方法,而是提出自适应RR,根据评论家可塑性水平动态调整RR。广泛评估表明,自适应RR不仅避免早期灾难性可塑性损失,还能在后期阶段受益于更频繁的重用,从而实现卓越的样本效率。