Estimating the 6D object pose from a single RGB image often involves noise and indeterminacy due to challenges such as occlusions and cluttered backgrounds. Meanwhile, diffusion models have shown appealing performance in generating high-quality images from random noise with high indeterminacy through step-by-step denoising. Inspired by their denoising capability, we propose a novel diffusion-based framework (6D-Diff) to handle the noise and indeterminacy in object pose estimation for better performance. In our framework, to establish accurate 2D-3D correspondence, we formulate 2D keypoints detection as a reverse diffusion (denoising) process. To facilitate such a denoising process, we design a Mixture-of-Cauchy-based forward diffusion process and condition the reverse process on the object features. Extensive experiments on the LM-O and YCB-V datasets demonstrate the effectiveness of our framework.
翻译:从单张RGB图像估计6D物体姿态通常面临由遮挡和杂乱背景等挑战引起的噪声与不确定性。与此同时,扩散模型通过逐步去噪,展现了从高不确定性的随机噪声生成高质量图像的出色能力。受其去噪能力的启发,我们提出了一种基于扩散的新型框架(6D-Diff),用于处理物体姿态估计中的噪声与不确定性,从而提升性能。在该框架中,为了建立精确的2D-3D对应关系,我们将2D关键点检测构建为逆向扩散(去噪)过程。为促进该去噪过程,我们设计了一种基于混合柯西分布的前向扩散过程,并将逆向过程条件化为物体特征。在LM-O和YCB-V数据集上的大量实验证明了我们框架的有效性。