We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence, while simultaneously performing neural 3D reconstruction of the object. Our method works for arbitrary rigid objects, even when visual texture is largely absent. The object is assumed to be segmented in the first frame only. No additional information is required, and no assumption is made about the interaction agent. Key to our method is a Neural Object Field that is learned concurrently with a pose graph optimization process in order to robustly accumulate information into a consistent 3D representation capturing both geometry and appearance. A dynamic pool of posed memory frames is automatically maintained to facilitate communication between these threads. Our approach handles challenging sequences with large pose changes, partial and full occlusion, untextured surfaces, and specular highlights. We show results on HO3D, YCBInEOAT, and BEHAVE datasets, demonstrating that our method significantly outperforms existing approaches. Project page: https://bundlesdf.github.io
翻译:我们提出了一种近实时方法,用于从单目RGBD视频序列中跟踪未知物体的6-DoF姿态,同时对该物体进行神经三维重建。本方法适用于任意刚体物体,即使物体表面缺乏视觉纹理。假设仅在初始帧中对物体进行分割,无需额外信息,也不对交互主体做任何假设。该方法的关键在于同步学习神经物体场与姿态图优化过程,从而将信息稳健地整合到同时捕捉几何与外观的一致三维表征中。自动维护一个动态的带姿态记忆帧池,以促进这些线程之间的通信。本方法能够处理具有大姿态变化、部分和完全遮挡、无纹理表面以及镜面高光的挑战性序列。我们在HO3D、YCBInEOAT和BEHAVE数据集上展示了结果,证明本方法显著优于现有方法。项目页面:https://bundlesdf.github.io