Unsupervised pre-training can equip reinforcement learning agents with prior knowledge and accelerate learning in downstream tasks. A promising direction, grounded in human development, investigates agents that learn by setting and pursuing their own goals. The core challenge lies in how to effectively generate, select, and learn from such goals. Our focus is on broad distributions of downstream tasks where solving every task zero-shot is infeasible. Such settings naturally arise when the target tasks lie outside of the pre-training distribution or when their identities are unknown to the agent. In this work, we (i) optimize for efficient multi-episode exploration and adaptation within a meta-learning framework, and (ii) guide the training curriculum with evolving estimates of the agent's post-adaptation performance. We present ULEE, an unsupervised meta-learning method that combines an in-context learner with an adversarial goal-generation strategy that maintains training at the frontier of the agent's capabilities. On XLand-MiniGrid benchmarks, ULEE pre-training yields improved exploration and adaptation abilities that generalize to novel objectives, environment dynamics, and map structures. The resulting policy attains improved zero-shot and few-shot performance, and provides a strong initialization for longer fine-tuning processes. It outperforms learning from scratch, DIAYN pre-training, and alternative curricula. Code is available at: https://github.com/Octavio-Pappalardo/ulee-jax
翻译:无监督预训练能够为强化学习代理提供先验知识,并加速下游任务的学习。受人类发展启发的方向,研究者探索了通过设定并追求自身目标进行学习的代理。核心挑战在于如何有效生成、选择并从这类目标中学习。我们关注的是下游任务的广泛分布,其中零样本解决每个任务是不可行的。当目标任务位于预训练分布之外或代理未知其身份时,此类情况自然出现。本文中,我们(i)在元学习框架内优化多回合探索与适应的高效性,以及(ii)通过代理适应后性能的演化估计来引导训练课程。我们提出ULEE,一种无监督元学习方法,结合了上下文学习器与对抗性目标生成策略,使训练维持在代理能力前沿。在XLand-MiniGrid基准测试中,ULEE预训练提升了探索与适应能力,这些能力可泛化至新目标、环境动态及地图结构。所得策略在零样本和少样本场景下性能更优,并为更长的微调过程提供了强初始化。它优于从零学习、DIAYN预训练及替代课程的方法。代码见:https://github.com/Octavio-Pappalardo/ulee-jax