Building on the success of Neural Radiance Fields (NeRFs), recent years have seen significant advances in the domain of novel view synthesis. These models capture the scene's volumetric radiance field, creating highly convincing dense photorealistic models through the use of simple, differentiable rendering equations. Despite their popularity, these algorithms suffer from severe ambiguities in visual data inherent to the RGB sensor, which means that although images generated with view synthesis can visually appear very believable, the underlying 3D model will often be wrong. This considerably limits the usefulness of these models in practical applications like Robotics and Extended Reality (XR), where an accurate dense 3D reconstruction otherwise would be of significant value. In this technical report, we present the vital differences between view synthesis models and 3D reconstruction models. We also comment on why a depth sensor is essential for modeling accurate geometry in general outward-facing scenes using the current paradigm of novel view synthesis methods. Focusing on the structure-from-motion task, we practically demonstrate this need by extending the Plenoxel radiance field model: Presenting an analytical differential approach for dense mapping and tracking with radiance fields based on RGB-D data without a neural network. Our method achieves state-of-the-art results in both the mapping and tracking tasks while also being faster than competing neural network-based approaches.
翻译:基于神经辐射场(NeRF)的成功,近年来在新视角合成领域取得了显著进展。这些模型通过使用简单的可微渲染方程,捕捉场景的体积辐射场,创建出高度逼真的密集光电模型。尽管这些算法广受欢迎,但它们仍然存在由RGB传感器固有的视觉数据严重歧义问题。这意味着,尽管通过视角合成生成的图像在视觉上可能非常可信,但底层的3D模型往往存在错误。这极大地限制了这些模型在机器人学和扩展现实(XR)等实际应用中的实用性,而在此类应用中,精确的密集3D重建本应具有重要价值。在本技术报告中,我们阐述了视角合成模型与3D重建模型之间的关键差异。我们还评论了为何在当前的新视角合成方法范式下,对于一般外向场景,深度传感器对于精确几何建模至关重要。聚焦于运动恢复结构任务,我们通过扩展Plenoxel辐射场模型,实际展示了这一需求:提出了一种基于RGB-D数据、无需神经网络的辐射场密集映射与跟踪的解析微分方法。我们的方法在映射和跟踪任务上均达到了最先进水平,同时速度也快于竞争性的基于神经网络的方法。