In Reinforcement Learning, the trade-off between exploration and exploitation poses a complex challenge for achieving efficient learning from limited samples. While recent works have been effective in leveraging past experiences for policy updates, they often overlook the potential of reusing past experiences for data collection. Independent of the underlying RL algorithm, we introduce the concept of a Contrastive Initial State Buffer, which strategically selects states from past experiences and uses them to initialize the agent in the environment in order to guide it toward more informative states. We validate our approach on two complex robotic tasks without relying on any prior information about the environment: (i) locomotion of a quadruped robot traversing challenging terrains and (ii) a quadcopter drone racing through a track. The experimental results show that our initial state buffer achieves higher task performance than the nominal baseline while also speeding up training convergence.
翻译:在强化学习中,探索与利用的权衡对于从有限样本中实现高效学习构成了复杂挑战。尽管近期研究在利用过往经验进行策略更新方面卓有成效,但往往忽视了将过往经验复用于数据收集的潜力。我们引入了一种与底层强化学习算法无关的对比初始状态缓冲区概念,该缓冲区能够从过往经验中策略性地选取状态,并将其用于初始化环境中的智能体,从而引导其朝向信息更丰富的状态。我们分别在两项无需依赖环境先验信息的复杂机器人任务上验证了该方法:(i)四足机器人在崎岖地形上的移动控制,及(ii)四旋翼无人机穿越赛道飞行。实验结果表明,我们的初始状态缓冲区在加速训练收敛的同时,实现了较常规基准方法更高的任务性能。