Robots often rely on a repertoire of previously-learned motion policies for performing tasks of diverse complexities. When facing unseen task conditions or when new task requirements arise, robots must adapt their motion policies accordingly. In this context, policy optimization is the \emph{de facto} paradigm to adapt robot policies as a function of task-specific objectives. Most commonly-used motion policies carry particular structures that are often overlooked in policy optimization algorithms. We instead propose to leverage the structure of probabilistic policies by casting the policy optimization as an optimal transport problem. Specifically, we focus on robot motion policies that build on Gaussian mixture models (GMMs) and formulate the policy optimization as a Wassertein gradient flow over the GMMs space. This naturally allows us to constrain the policy updates via the $L^2$-Wasserstein distance between GMMs to enhance the stability of the policy optimization process. Furthermore, we leverage the geometry of the Bures-Wasserstein manifold to optimize the Gaussian distributions of the GMM policy via Riemannian optimization. We evaluate our approach on common robotic settings: Reaching motions, collision-avoidance behaviors, and multi-goal tasks. Our results show that our method outperforms common policy optimization baselines in terms of task success rate and low-variance solutions.
翻译:机器人通常依赖先前学习到的运动策略库来执行不同复杂度的任务。当面对未见过的任务条件或出现新任务需求时,机器人必须相应调整其运动策略。在此背景下,策略优化是依据任务特定目标调整机器人策略的事实标准范式。最常用的运动策略具有特定结构,而这些结构在策略优化算法中常被忽视。我们转而提出通过将策略优化建模为最优输运问题来利用概率策略的结构特性。具体而言,我们聚焦于基于高斯混合模型(GMM)的机器人运动策略,并将策略优化表述为GMM空间上的Wasserstein梯度流。这自然使我们能够通过GMM之间的$L^2$-Wasserstein距离约束策略更新,从而增强策略优化过程的稳定性。此外,我们利用Bures-Wasserstein流形的几何结构,通过黎曼优化方法优化GMM策略的高斯分布。我们在常见机器人场景中评估了该方法:到达运动、避碰行为以及多目标任务。结果表明,我们的方法在任务成功率和低方差解方面优于常见的策略优化基准方法。