Numerous locomotion controllers have been designed based on Reinforcement Learning (RL) to facilitate blind quadrupedal locomotion traversing challenging terrains. Nevertheless, locomotion control is still a challenging task for quadruped robots traversing diverse terrains amidst unforeseen disturbances. Recently, privileged learning has been employed to learn reliable and robust quadrupedal locomotion over various terrains based on a teacher-student architecture. However, its one-encoder structure is not adequate in addressing external force perturbations. The student policy would experience inevitable performance degradation due to the feature embedding discrepancy between the feature encoder of the teacher policy and the one of the student policy. Hence, this paper presents a privileged learning framework with multiple feature encoders and a residual policy network for robust and reliable quadruped locomotion subject to various external perturbations. The multi-encoder structure can decouple latent features from different privileged information, ultimately leading to enhanced performance of the learned policy in terms of robustness, stability, and reliability. The efficiency of the proposed feature encoding module is analyzed in depth using extensive simulation data. The introduction of the residual policy network helps mitigate the performance degradation experienced by the student policy that attempts to clone the behaviors of a teacher policy. The proposed framework is evaluated on a Unitree GO1 robot, showcasing its performance enhancement over the state-of-the-art privileged learning algorithm through extensive experiments conducted on diverse terrains. Ablation studies are conducted to illustrate the efficiency of the residual policy network.
翻译:大量基于强化学习(RL)设计的运动控制器已被用于实现四足机器人在复杂地形下的盲运动穿越。然而,对于在多样地形中穿越并面临不可预见扰动的四足机器人而言,运动控制仍然是一项具有挑战性的任务。近年来,基于师生架构的特权学习已被用于学习在各种地形上可靠且鲁棒的四足运动。然而,其单一编码器结构不足以应对外部力扰动。由于教师策略的特征编码器与学生策略的特征编码器之间存在特征嵌入差异,学生策略的性能不可避免地会出现下降。因此,本文提出了一种具有多个特征编码器和残差策略网络的特权学习框架,旨在实现四足机器人在各种外部扰动下鲁棒且可靠的运动。多编码器结构能够解耦来自不同特权信息的潜在特征,最终提升所学策略在鲁棒性、稳定性和可靠性方面的性能。我们利用大量仿真数据深入分析了所提特征编码模块的效率。残差策略网络的引入有助于缓解学生策略在模仿教师策略行为时出现的性能下降。所提框架在Unitree GO1机器人上进行了评估,通过在多样地形上进行的大量实验,展示了其相对于最先进特权学习算法的性能提升。消融研究进一步说明了残差策略网络的有效性。