On-policy reinforcement learning (RL) algorithms have demonstrated great potential in robotic control, where effective exploration is crucial for efficient and high-quality policy learning. However, how to encourage the agent to explore the better trajectories efficiently remains a challenge. Most existing methods incentivize exploration by maximizing the policy entropy or encouraging novel state visiting regardless of the potential state value. We propose a new form of directed exploration that uses analytical policy gradients from a differentiable dynamics model to inject task-aware, physics-guided guidance, thereby steering the agent towards high-reward regions for accelerated and more effective policy learning.
翻译:同策略强化学习算法在机器人控制中展现出巨大潜力,其中有效的探索对于高效且高质量的策略学习至关重要。然而,如何激励智能体高效探索更优轨迹仍是一个挑战。现有方法大多通过最大化策略熵或鼓励访问新颖状态来激励探索,而不考虑潜在状态价值。我们提出一种新型指导性探索方法,通过利用可微分动力学模型的分析性策略梯度注入任务感知的物理引导,从而引导智能体向高奖励区域移动,实现加速且更有效的策略学习。