In this paper, we study the whole-body loco-manipulation problem using reinforcement learning (RL). Specifically, we focus on the problem of how to coordinate the floating base and the robotic arm of a wheeled-quadrupedal manipulator robot to achieve direct six-dimensional (6D) end-effector (EE) pose tracking in task space. Different from conventional whole-body loco-manipulation problems that track both floating-base and end-effector commands, the direct EE pose tracking problem requires inherent balance among redundant degrees of freedom in the whole-body motion. We leverage RL to solve this challenging problem. To address the associated difficulties, we develop a novel reward fusion module (RFM) that systematically integrates reward terms corresponding to different tasks in a nonlinear manner. In such a way, the inherent multi-stage and hierarchical feature of the loco-manipulation problem can be carefully accommodated. By combining the proposed RFM with the a teacher-student RL training paradigm, we present a complete RL scheme to achieve 6D EE pose tracking for the wheeled-quadruped manipulator robot. Extensive simulation and hardware experiments demonstrate the significance of the RFM. In particular, we enable smooth and precise tracking performance, achieving state-of-the-art tracking position error of less than 5 cm, and rotation error of less than 0.1 rad. Please refer to https://clearlab-sustech.github.io/RFM_loco_mani/ for more experimental videos.
翻译:本文研究利用强化学习解决全身运动操作问题。具体而言,我们聚焦于如何协调轮式四足机械臂机器人的浮动基座与机械臂,以实现任务空间中末端执行器六维位姿的直接跟踪。与同时跟踪浮动基座和末端执行器指令的传统全身运动操作问题不同,直接末端执行器位姿跟踪问题要求全身运动中冗余自由度之间的内在平衡。我们利用强化学习解决这一挑战性问题。针对相关难点,我们开发了一种新颖的奖励融合模块,以非线性方式系统性地整合对应不同任务的奖励项。通过这种方式,可以细致地适应运动操作问题固有的多阶段与层次化特征。通过将所提出的奖励融合模块与师生强化学习训练范式相结合,我们提出了一套完整的强化学习方案,以实现轮式四足机械臂机器人的六维末端执行器位姿跟踪。大量仿真与硬件实验验证了奖励融合模块的重要性。特别地,我们实现了平滑且精确的跟踪性能,达到了跟踪位置误差小于5厘米、旋转误差小于0.1弧度的先进水平。更多实验视频请参见 https://clearlab-sustech.github.io/RFM_loco_mani/。