Causal Policy Gradient for Whole-Body Mobile Manipulation

Developing the next generation of household robot helpers requires combining locomotion and interaction capabilities, which is generally referred to as mobile manipulation (MoMa). MoMa tasks are difficult due to the large action space of the robot and the common multi-objective nature of the task, e.g., efficiently reaching a goal while avoiding obstacles. Current approaches often segregate tasks into navigation without manipulation and stationary manipulation without locomotion by manually matching parts of the action space to MoMa sub-objectives (e.g. base actions for locomotion objectives and arm actions for manipulation). This solution prevents simultaneous combinations of locomotion and interaction degrees of freedom and requires human domain knowledge for both partitioning the action space and matching the action parts to the sub-objectives. In this paper, we introduce Causal MoMa, a new framework to train policies for typical MoMa tasks that makes use of the most favorable subspace of the robot's action space to address each sub-objective. Causal MoMa automatically discovers the causal dependencies between actions and terms of the reward function and exploits these dependencies in a causal policy learning procedure that reduces gradient variance compared to previous state-of-the-art policy gradient algorithms, improving convergence and results. We evaluate the performance of Causal MoMa on three types of simulated robots across different MoMa tasks and demonstrate success in transferring the policies trained in simulation directly to a real robot, where our agent is able to follow moving goals and react to dynamic obstacles while simultaneously and synergistically controlling the whole-body: base, arm, and head. More information at https://sites.google.com/view/causal-moma.

翻译：开发下一代家用机器人助手需要结合移动与交互能力，这通常被称为移动操作（MoMa）。MoMa任务因机器人巨大的动作空间及任务的多目标特性（例如，在避开障碍物的同时高效到达目标）而极具挑战性。当前方法通常将任务划分为无操作的导航和无可移动的静态操作，通过人为将动作空间的部分与MoMa子目标匹配（例如，基础动作对应移动目标，手臂动作对应操作目标）。这种解决方案阻碍了移动与操作自由度的同步组合，并且需要人类领域知识来划分动作空间并将动作部分匹配至子目标。本文提出Causal MoMa——一种针对典型MoMa任务训练策略的新框架，该框架利用机器人动作空间中最有利的子空间来处理每个子目标。Causal MoMa自动发现动作与奖励函数项之间的因果依赖关系，并在因果策略学习过程中利用这些依赖关系，相较于先前最先进的策略梯度算法降低了梯度方差，从而改善了收敛性与结果。我们在三种模拟机器人上针对不同MoMa任务评估了Causal MoMa的性能，并证明了将模拟训练的策略直接迁移至真实机器人的可行性：我们的智能体能够同时协同控制全身（基座、手臂和头部），追踪移动目标并响应动态障碍物。更多信息请访问 https://sites.google.com/view/causal-moma。