This paper introduces GS-Pose, a unified framework for localizing and estimating the 6D pose of novel objects. GS-Pose begins with a set of posed RGB images of a previously unseen object and builds three distinct representations stored in a database. At inference, GS-Pose operates sequentially by locating the object in the input image, estimating its initial 6D pose using a retrieval approach, and refining the pose with a render-and-compare method. The key insight is the application of the appropriate object representation at each stage of the process. In particular, for the refinement step, we leverage 3D Gaussian splatting, a novel differentiable rendering technique that offers high rendering speed and relatively low optimization time. Off-the-shelf toolchains and commodity hardware, such as mobile phones, can be used to capture new objects to be added to the database. Extensive evaluations on the LINEMOD and OnePose-LowTexture datasets demonstrate excellent performance, establishing the new state-of-the-art. Project page: https://dingdingcai.github.io/gs-pose.
翻译:本文提出了GS-Pose,一个用于定位并估计新物体6D姿态的统一框架。GS-Pose从一组先前未见物体的带姿态RGB图像出发,构建三种不同的表征并存储于数据库中。在推理阶段,GS-Pose依次执行以下操作:在输入图像中定位物体,通过检索方法估计其初始6D姿态,并采用渲染-比较方法对姿态进行精细化调整。该方法的核心在于为流程的每个阶段选用合适的物体表征。特别地,在精细化步骤中,我们利用了3D高斯泼溅这一新型可微分渲染技术,该技术具有高渲染速度与相对较低的优化耗时。用户可使用现成的工具链与消费级硬件(如手机)采集待添加至数据库的新物体。在LINEMOD与OnePose-LowTexture数据集上的大量实验证明了本方法的优异性能,确立了新的技术标杆。项目页面:https://dingdingcai.github.io/gs-pose。