We propose a method for in-hand 3D scanning of an unknown object with a monocular camera. Our method relies on a neural implicit surface representation that captures both the geometry and the appearance of the object, however, by contrast with most NeRF-based methods, we do not assume that the camera-object relative poses are known. Instead, we simultaneously optimize both the object shape and the pose trajectory. As direct optimization over all shape and pose parameters is prone to fail without coarse-level initialization, we propose an incremental approach that starts by splitting the sequence into carefully selected overlapping segments within which the optimization is likely to succeed. We reconstruct the object shape and track its poses independently within each segment, then merge all the segments before performing a global optimization. We show that our method is able to reconstruct the shape and color of both textured and challenging texture-less objects, outperforms classical methods that rely only on appearance features, and that its performance is close to recent methods that assume known camera poses.
翻译:我们提出一种使用单目相机对手中未知物体进行三维扫描的方法。该方法基于神经隐式表面表征,能够同时捕获物体的几何形状与外观特征。与多数基于NeRF的方法不同,我们并未假设相机与物体之间的相对位姿已知,而是同步优化物体形状与位姿轨迹。由于直接优化全部形状与位姿参数在缺乏粗粒度初始化时容易失败,我们提出一种增量式方法:首先将序列拆分为精心选择的交叠片段,在这些片段内优化更易成功;随后在每个片段内独立重建物体形状并追踪其位姿;最后合并所有片段进行全局优化。实验表明,该方法能够成功重建带纹理与无纹理困难物体的形状与颜色,性能优于仅依赖外观特征的经典方法,且接近需已知相机位姿的最新方法。