In Mobile Manipulation, selecting an optimal mobile base pose is essential for successful object grasping. Previous works have addressed this problem either through classical planning methods or by learning state-based policies. They assume access to reliable state information, such as the precise object poses and environment models. In this work, we study base pose planning directly from top-down orthographic projections of the scene, which provide a global overview of the scene while preserving spatial structure. We propose VBM-NET, a learning-based method for base pose selection using such top-down orthographic projections. We use equivariant TransporterNet to exploit spatial symmetries and efficiently learn candidate base poses for grasping. Further, we use graph neural networks to represent a varying number of candidate base poses and use Reinforcement Learning to determine the optimal base pose among them. We show that VBM-NET can produce comparable solutions to the classical methods in significantly less computation time. Furthermore, we validate sim-to-real transfer by successfully deploying a policy trained in simulation to real-world mobile manipulation.
翻译:在移动操作中,选择最优的移动底座姿态对于成功抓取物体至关重要。先前的研究通过经典规划方法或基于状态的学习策略来解决此问题。它们假设能够获取可靠的状态信息,例如精确的物体姿态和环境模型。在本工作中,我们研究直接从场景的俯视正交投影进行底座姿态规划,该方法在保留空间结构的同时提供了场景的全局概览。我们提出了VBM-NET,一种基于学习的、利用此类俯视正交投影进行底座姿态选择的方法。我们使用等变TransporterNet来利用空间对称性,高效学习用于抓取的候选底座姿态。此外,我们使用图神经网络来表示数量可变的候选底座姿态,并利用强化学习从中确定最优底座姿态。我们证明,VBM-NET能够在显著更短的计算时间内产生与经典方法相当的解决方案。此外,我们通过在真实世界的移动操作中成功部署在仿真环境中训练的策略,验证了仿真到现实的迁移能力。