In this work, we present a dense tracking and mapping system named Vox-Fusion, which seamlessly fuses neural implicit representations with traditional volumetric fusion methods. Our approach is inspired by the recently developed implicit mapping and positioning system and further extends the idea so that it can be freely applied to practical scenarios. Specifically, we leverage a voxel-based neural implicit surface representation to encode and optimize the scene inside each voxel. Furthermore, we adopt an octree-based structure to divide the scene and support dynamic expansion, enabling our system to track and map arbitrary scenes without knowing the environment like in previous works. Moreover, we proposed a high-performance multi-process framework to speed up the method, thus supporting some applications that require real-time performance. The evaluation results show that our methods can achieve better accuracy and completeness than previous methods. We also show that our Vox-Fusion can be used in augmented reality and virtual reality applications. Our source code is publicly available at https://github.com/zju3dv/Vox-Fusion.
翻译:本文提出了一种名为Vox-Fusion的密集追踪与建图系统,该方案将神经隐式表示与传统体素融合方法无缝结合。受近期发展的隐式建图与定位系统启发,我们进一步拓展了相关思想,使其能够灵活应用于实际场景。具体而言,我们采用基于体素的神经隐式表面表示,在每个体素内部对场景进行编码与优化。此外,我们利用八叉树结构划分场景并支持动态扩展,使系统无需像先前工作那样预知环境即可对任意场景实施追踪与建图。我们进一步提出了一种高性能多进程框架以加速算法,从而支持需要实时性能的各类应用。评估结果表明,我们的方法在精度与完整性上均优于现有方法。实验还展示了Vox-Fusion在增强现实与虚拟现实应用中的可行性。相关源代码已在https://github.com/zju3dv/Vox-Fusion开源。