6D pose estimation of textureless objects is valuable for industrial robotic applications, yet remains challenging due to the frequent loss of depth information. Current multi-view methods either rely on depth data or insufficiently exploit multi-view geometric cues, limiting their performance. In this paper, we propose DKPMV, a pipeline that achieves dense keypoint-level fusion using only multi-view RGB images as input. We design a three-stage progressive pose optimization strategy that leverages dense multi-view keypoint geometry information. To enable effective dense keypoint fusion, we enhance the keypoint network with attentional aggregation and symmetry-aware training, improving prediction accuracy and resolving ambiguities on symmetric objects. Extensive experiments on the ROBI dataset demonstrate that DKPMV outperforms state-of-the-art multi-view RGB approaches and even surpasses the RGB-D methods in the majority of cases. The code will be available soon.
翻译:无纹理物体的6D位姿估计在工业机器人应用中具有重要价值,但由于深度信息频繁缺失,该任务仍具挑战性。当前的多视角方法要么依赖深度数据,要么未能充分利用多视角几何线索,限制了其性能。本文提出DKPMV流程,该流程仅以多视角RGB图像作为输入,实现了密集关键点级别的融合。我们设计了一种三阶段渐进式位姿优化策略,充分利用密集多视角关键点的几何信息。为实现有效的密集关键点融合,我们通过注意力聚合与对称感知训练增强了关键点网络,提升了预测精度并解决了对称物体上的歧义问题。在ROBI数据集上的大量实验表明,DKPMV优于当前最先进的多视角RGB方法,甚至在多数情况下超越了RGB-D方法。代码即将公开。