This letter presents KGpose, a novel end-to-end framework for 6D pose estimation of multiple objects. Our approach combines keypoint-based method with learnable pose regression through `keypoint-graph', which is a graph representation of the keypoints. KGpose first estimates 3D keypoints for each object using an attentional multi-modal feature fusion of RGB and point cloud features. These keypoints are estimated from each point of point cloud and converted into a graph representation. The network directly regresses 6D pose parameters for each point through a sequence of keypoint-graph embedding and local graph embedding which are designed with graph convolutions, followed by rotation and translation heads. The final pose for each object is selected from the candidates of point-wise predictions. The method achieves competitive results on the benchmark dataset, demonstrating the effectiveness of our model. KGpose enables multi-object pose estimation without requiring an extra localization step, offering a unified and efficient solution for understanding geometric contexts in complex scenes for robotic applications.
翻译:本文提出KGpose,一种用于多物体6D姿态估计的新型端到端框架。该方法通过“关键点图”——即关键点的图结构表示——将基于关键点的方法与可学习的姿态回归相结合。KGpose首先利用RGB与点云特征的注意力多模态融合,为每个物体估计三维关键点。这些关键点从点云的每个点中估计得到,并转换为图表示。网络通过一系列基于图卷积设计的关键点图嵌入与局部图嵌入,随后连接旋转与平移预测头,直接回归每个点的6D姿态参数。每个物体的最终姿态从逐点预测的候选结果中选取。该方法在基准数据集上取得了具有竞争力的结果,证明了模型的有效性。KGpose无需额外的定位步骤即可实现多物体姿态估计,为机器人应用中复杂场景的几何上下文理解提供了一个统一且高效的解决方案。