6D object pose estimation in cluttered scenes remains challenging due to severe occlusion and sensor noise. We propose MAPRPose, a two-stage framework that leverages mask-aware correspondences for pose proposal and amodal-driven Region-of-Interest (ROI) prediction for robust refinement. In the Mask-Aware Pose Proposal (MAPP) stage, we lift 2D correspondences into 3D space to establish reliable keypoint matches and generate geometrically consistent pose hypotheses based on correspondence-level scoring, from which the top-$K$ candidates are selected. In the refinement stage, we introduce a tensorized render-and-compare pipeline integrated with an Amodal Mask Prediction and ROI Re-Alignment (AMPR) module. By reconstructing complete object geometry and dynamically adjusting the ROI, AMPR mitigates localization errors and spatial misalignment under heavy occlusion. Furthermore, our GPU-accelerated RGB-XYZ reprojection enables simultaneous refinement of all $N \times B$ pose hypotheses in a single forward pass. Evaluated on the BOP benchmark, MAPRPose achieves a state-of-the-art Average Recall (AR) of 76.5%, outperforming FoundationPose by 3.1% AR while delivering a 43x speedup in multi-object inference.
翻译:在杂乱场景中,由于严重遮挡和传感器噪声,6D目标姿态估计仍具挑战性。我们提出MAPRPose,一种两阶段框架,利用掩膜感知对应关系进行姿态提议,并采用不可见区域驱动的感兴趣区域(ROI)预测实现鲁棒精化。在掩膜感知姿态提议(MAPP)阶段,我们将2D对应关系提升至3D空间以建立可靠的关键点匹配,并基于对应级别评分生成几何一致的姿态假设,从中选取前$K$个候选。在精化阶段,我们引入一种张量化渲染-比较管道,并集成不可见区域掩膜预测与ROI重新对齐(AMPR)模块。通过重建完整目标几何并动态调整ROI,AMPR可减轻严重遮挡下的定位误差和空间错位。此外,我们的GPU加速RGB-XYZ重投影能够单次前向传递中同时精化所有$N \times B$个姿态假设。在BOP基准测试中,MAPRPose实现了76.5%的平均召回率(AR)的最优性能,在超出FoundationPose达3.1% AR的同时,多目标推理速度提升43倍。