Self-supervision has the potential to transform reinforcement learning (RL), paralleling the breakthroughs it has enabled in other areas of machine learning. While self-supervised learning in other domains aims to find patterns in a fixed dataset, self-supervised goal-conditioned reinforcement learning (GCRL) agents discover new behaviors by learning from the goals achieved during unstructured interaction with the environment. However, these methods have failed to see similar success, both due to a lack of data from slow environments as well as a lack of stable algorithms. We take a step toward addressing both of these issues by releasing a high-performance codebase and benchmark JaxGCRL for self-supervised GCRL, enabling researchers to train agents for millions of environment steps in minutes on a single GPU. The key to this performance is a combination of GPU-accelerated environments and a stable, batched version of the contrastive reinforcement learning algorithm, based on an infoNCE objective, that effectively makes use of this increased data throughput. With this approach, we provide a foundation for future research in self-supervised GCRL, enabling researchers to quickly iterate on new ideas and evaluate them in a diverse set of challenging environments. Website + Code: https://github.com/MichalBortkiewicz/JaxGCRL
翻译:自监督学习有望彻底变革强化学习(RL),正如其在机器学习其他领域所实现的突破。虽然其他领域的自监督学习旨在从固定数据集中发现模式,但自监督目标条件强化学习(GCRL)智能体通过与环境的非结构化交互,从已实现的目标中学习,从而发现新的行为模式。然而,由于慢速环境数据匮乏以及算法稳定性不足,这些方法尚未取得类似成功。我们通过发布高性能代码库和基准测试平台JaxGCRL来推动这两个问题的解决,使研究人员能够在单GPU上数分钟内完成数百万环境步的智能体训练。实现此性能的关键在于结合GPU加速环境与基于infoNCE目标的对比强化学习算法的稳定批处理版本,该算法能有效利用提升的数据吞吐量。通过这一方法,我们为自监督GCRL的未来研究奠定基础,使研究人员能够快速迭代新思路,并在多样化挑战性环境中进行评估。项目网站与代码:https://github.com/MichalBortkiewicz/JaxGCRL