The emergence of differentiable simulators enabling analytic gradient computation has motivated a new wave of learning algorithms that hold the potential to significantly increase sample efficiency over traditional Reinforcement Learning (RL) methods. While recent research has demonstrated performance gains in scenarios with comparatively smooth dynamics and, thus, smooth optimization landscapes, research on leveraging differentiable simulators for contact-rich scenarios, such as legged locomotion, is scarce. This may be attributed to the discontinuous nature of contact, which introduces several challenges to optimizing with analytic gradients. The purpose of this paper is to determine if analytic gradients can be beneficial even in the face of contact. Our investigation focuses on the effects of different soft and hard contact models on the learning process, examining optimization challenges through the lens of contact simulation. We demonstrate the viability of employing analytic gradients to learn physically plausible locomotion skills with a quadrupedal robot using Short-Horizon Actor-Critic (SHAC), a learning algorithm leveraging analytic gradients, and draw a comparison to a state-of-the-art RL algorithm, Proximal Policy Optimization (PPO), to understand the benefits of analytic gradients.
翻译:可微分仿真器的出现使得分析梯度计算成为可能,从而催生了一波新的学习算法,这类算法有可能显著提升传统强化学习方法的样本效率。尽管近期研究表明,在动力学相对平滑、优化景观平缓的场景中,这些算法已展现出性能优势,但关于利用可微分仿真器处理接触密集型场景的研究仍较为匮乏,例如腿式运动领域。这或许归因于接触现象的离散本质,其为分析梯度优化带来了诸多挑战。本文旨在探究分析梯度在接触场景下是否仍具有优势。我们通过接触仿真视角,重点研究了不同软接触与硬接触模型对学习过程的影响,并剖析了优化难题。基于短视域演员-评论家算法——一种利用分析梯度的学习算法,我们验证了采用分析梯度学习四足机器人物理合理运动技能的有效性,并与最先进的强化学习算法——近端策略优化进行对比,以深入理解分析梯度的优势。