While grasp detection is an important part of any robotic manipulation pipeline, reliable and accurate grasp detection in $SE(3)$ remains a research challenge. Many robotics applications in unstructured environments such as the home or warehouse would benefit a lot from better grasp performance. This paper proposes a novel framework for detecting $SE(3)$ grasp poses based on point cloud input. Our main contribution is to propose an $SE(3)$-equivariant model that maps each point in the cloud to a continuous grasp quality function over the 2-sphere $S^2$ using spherical harmonic basis functions. Compared with reasoning about a finite set of samples, this formulation improves the accuracy and efficiency of our model when a large number of samples would otherwise be needed. In order to accomplish this, we propose a novel variation on EquiFormerV2 that leverages a UNet-style encoder-decoder architecture to enlarge the number of points the model can handle. Our resulting method, which we name $\textit{OrbitGrasp}$, significantly outperforms baselines in both simulation and physical experiments.
翻译:尽管抓取检测是任何机器人操作流程的重要组成部分,但在$SE(3)$空间中实现可靠且准确的抓取检测仍然是一个研究挑战。在家庭或仓库等非结构化环境中的许多机器人应用,都将从更好的抓取性能中大幅受益。本文提出了一种基于点云输入检测$SE(3)$抓取姿态的新颖框架。我们的主要贡献是提出了一个$SE(3)$等变模型,该模型利用球谐基函数将点云中的每个点映射到二维球面$S^2$上的连续抓取质量函数。与对有限样本集进行推理相比,这种表述在原本需要大量样本的情况下,提高了我们模型的准确性和效率。为了实现这一点,我们在EquiFormerV2的基础上提出了一种新颖的变体,该变体利用UNet风格的编码器-解码器架构来扩大模型能够处理的点数。我们将所得方法命名为$\textit{OrbitGrasp}$,其在仿真和物理实验中均显著优于基线方法。