Robotic manipulation systems operating in complex environments rely on perception systems that provide information about the geometry (pose and 3D shape) of the objects in the scene along with other semantic information such as object labels. This information is then used for choosing the feasible grasps on relevant objects. In this paper, we present a novel method to provide this geometric and semantic information of all objects in the scene as well as feasible grasps on those objects simultaneously. The main advantage of our method is its speed as it avoids sequential perception and grasp planning steps. With detailed quantitative analysis, we show that our method delivers competitive performance compared to the state-of-the-art dedicated methods for object shape, pose, and grasp predictions while providing fast inference at 30 frames per second speed.
翻译:在复杂环境中运行的机器人操作系统依赖感知系统来提供场景中物体的几何信息(位姿与3D形状)以及语义信息(如物体标签),这些信息随后用于选择可行抓取目标。本文提出一种新型方法,可同步获取场景中所有物体的几何与语义信息及其可行抓取策略。该方法的核心优势在于其速度,通过避免感知与抓取规划的串行步骤实现。通过详细的定量分析,我们证明该方法在物体形状、位姿和抓取预测任务中,与当前最先进的专用方法相比具有竞争力的性能,同时能以30帧/秒的速率实现快速推理。