Humans possess the ability to draw on past experiences explicitly when learning new tasks and applying them accordingly. We believe this capacity for self-referencing is especially advantageous for reinforcement learning agents in the unsupervised pretrain-then-finetune setting. During pretraining, an agent's past experiences can be explicitly utilized to mitigate the nonstationarity of intrinsic rewards. In the finetuning phase, referencing historical trajectories prevents the unlearning of valuable exploratory behaviors. Motivated by these benefits, we propose the Self-Reference (SR) approach, an add-on module explicitly designed to leverage historical information and enhance agent performance within the pretrain-finetune paradigm. Our approach achieves state-of-the-art results in terms of Interquartile Mean (IQM) performance and Optimality Gap reduction on the Unsupervised Reinforcement Learning Benchmark for model-free methods, recording an 86% IQM and a 16% Optimality Gap. Additionally, it improves current algorithms by up to 17% IQM and reduces the Optimality Gap by 31%. Beyond performance enhancement, the Self-Reference add-on also increases sample efficiency, a crucial attribute for real-world applications.
翻译:人类在学习新任务并加以应用时,能够明确调动过去的经验。我们认为,这种自我引用能力在无监督预训练-微调场景下对强化学习智能体尤为有利。在预训练阶段,智能体的历史经验可被显式利用,以缓解内在奖励的非平稳性问题;在微调阶段,引用历史轨迹可防止有价值探索行为被遗忘。受此启发,我们提出自我引用(Self-Reference)模块——一种专门设计用于在预训练-微调范式中利用历史信息、提升智能体性能的附加组件。在无监督强化学习基准测试的无模型方法中,该方法在四分位均值(IQM)性能和最优性差距缩减上达到当前最优水平,实现了86%的IQM和16%的最优性差距。此外,该方法可将现有算法的IQM提升最高达17%,并将最优性差距缩减31%。除性能增强外,该自我引用模块还提高了样本效率——这是现实应用中的关键属性。