A key challenge in model-free category-level pose estimation is the extraction of contextual object features that generalize across varying instances within a specific category. Recent approaches leverage foundational features to capture semantic and geometry cues from data. However, these approaches fail under partial visibility. We overcome this with a first-complete-then-aggregate strategy for feature extraction utilizing class priors. In this paper, we present GCE-Pose, a method that enhances pose estimation for novel instances by integrating category-level global context prior. GCE-Pose performs semantic shape reconstruction with a proposed Semantic Shape Reconstruction (SSR) module. Given an unseen partial RGB-D object instance, our SSR module reconstructs the instance's global geometry and semantics by deforming category-specific 3D semantic prototypes through a learned deep Linear Shape Model. We further introduce a Global Context Enhanced (GCE) feature fusion module that effectively fuses features from partial RGB-D observations and the reconstructed global context. Extensive experiments validate the impact of our global context prior and the effectiveness of the GCE fusion module, demonstrating that GCE-Pose significantly outperforms existing methods on challenging real-world datasets HouseCat6D and NOCS-REAL275. Our project page is available at https://colin-de.github.io/GCE-Pose/.
翻译:在无模型类别级姿态估计中,一个关键挑战是提取能够在特定类别内不同实例间泛化的上下文物体特征。现有方法利用基础特征从数据中捕获语义和几何线索,但在部分可见情况下往往失效。我们通过一种利用类别先验的“先补全后聚合”特征提取策略克服了这一局限。本文提出GCE-Pose方法,通过整合类别级全局上下文先验来增强新实例的姿态估计性能。GCE-Pose通过提出的语义形状重建模块进行语义形状重建。给定未见过的部分RGB-D物体实例,我们的SSR模块通过学习得到的深度线性形状模型对类别特定的3D语义原型进行形变,从而重建该实例的全局几何与语义信息。我们进一步提出全局上下文增强特征融合模块,有效融合部分RGB-D观测特征与重建的全局上下文特征。大量实验验证了我们提出的全局上下文先验的有效性以及GCE融合模块的优越性,证明GCE-Pose在具有挑战性的真实世界数据集HouseCat6D和NOCS-REAL275上显著优于现有方法。项目页面详见 https://colin-de.github.io/GCE-Pose/。