Causal Policy Gradient for Whole-Body Mobile Manipulation

Developing the next generation of household robot helpers requires combining locomotion and interaction capabilities, which is generally referred to as mobile manipulation (MoMa). MoMa tasks are difficult due to the large action space of the robot and the common multi-objective nature of the task, e.g., efficiently reaching a goal while avoiding obstacles. Current approaches often segregate tasks into navigation without manipulation and stationary manipulation without locomotion by manually matching parts of the action space to MoMa sub-objectives (e.g. base actions for locomotion objectives and arm actions for manipulation). This solution prevents simultaneous combinations of locomotion and interaction degrees of freedom and requires human domain knowledge for both partitioning the action space and matching the action parts to the sub-objectives. In this paper, we introduce Causal MoMa, a new framework to train policies for typical MoMa tasks that makes use of the most favorable subspace of the robot's action space to address each sub-objective. Causal MoMa automatically discovers the causal dependencies between actions and terms of the reward function and exploits these dependencies in a causal policy learning procedure that reduces gradient variance compared to previous state-of-the-art policy gradient algorithms, improving convergence and results. We evaluate the performance of Causal MoMa on three types of simulated robots across different MoMa tasks and demonstrate success in transferring the policies trained in simulation directly to a real robot, where our agent is able to follow moving goals and react to dynamic obstacles while simultaneously and synergistically controlling the whole-body: base, arm, and head. More information at https://sites.google.com/view/causal-moma.

翻译：研发下一代家用机器人助手需要结合移动与交互能力，这通常被称为移动操作（MoMa）。由于机器人动作空间庞大且任务通常具有多目标属性（例如，在高效到达目标的同时避开障碍物），MoMa任务极具挑战性。现有方法常通过人为将动作空间与MoMa子目标进行匹配（例如，将底盘动作分配给移动目标，将手臂动作分配给操作目标），将任务分割为无操作的导航和无需移动的静态操作。这种解决方案阻碍了移动与交互自由度的协同结合，且需要人类领域知识同时完成动作空间划分和动作部分与子目标的匹配。本文提出Causal MoMa——一种用于训练典型MoMa任务策略的新框架，该框架利用机器人动作空间中最优子空间来处理每个子目标。Causal MoMa自动发现动作与奖励函数项之间的因果关系，并在因果策略学习过程中利用这些依赖关系，相较于现有最先进的策略梯度算法，减少了梯度方差，从而提升了收敛效果与任务表现。我们在不同类型模拟机器人的多个MoMa任务上评估了Causal MoMa的性能，并成功将模拟训练的策略迁移至实体机器人。在实际部署中，我们的智能体能够同时协同控制全身（底盘、手臂和头部），追踪移动目标并动态响应障碍物。更多信息请访问 https://sites.google.com/view/causal-moma。