In Reinforcement Learning, the trade-off between exploration and exploitation poses a complex challenge for achieving efficient learning from limited samples. While recent works have been effective in leveraging past experiences for policy updates, they often overlook the potential of reusing past experiences for data collection. Independent of the underlying RL algorithm, we introduce the concept of a Contrastive Initial State Buffer, which strategically selects states from past experiences and uses them to initialize the agent in the environment in order to guide it toward more informative states. We validate our approach on two complex robotic tasks without relying on any prior information about the environment: (i) locomotion of a quadruped robot traversing challenging terrains and (ii) a quadcopter drone racing through a track. The experimental results show that our initial state buffer achieves higher task performance than the nominal baseline while also speeding up training convergence.
翻译:在强化学习中,探索与利用之间的权衡构成了从有限样本中实现高效学习的复杂挑战。尽管近期研究在利用过往经验进行策略更新方面取得了成效,但往往忽视了重用过往经验用于数据收集的潜力。独立于底层强化学习算法,我们引入了对比初始状态缓冲区的概念,该缓冲区策略性地从过往经验中选择状态,并将其用于在环境中初始化智能体,从而引导其进入更具信息量的状态。我们在两项无需依赖任何环境先验知识的复杂机器人任务上验证了本方法:(i) 穿越复杂地形的四足机器人步态控制;(ii) 四旋翼无人机赛道竞速。实验结果表明,我们的初始状态缓冲区在加速训练收敛的同时,比标准基线实现了更高的任务性能。