A spatial AI that can perform complex tasks through visual signals and cooperate with humans is highly anticipated. To achieve this, we need a visual SLAM that easily adapts to new scenes without pre-training and generates dense maps for downstream tasks in real-time. None of the previous learning-based and non-learning-based visual SLAMs satisfy all needs due to the intrinsic limitations of their components. In this work, we develop a visual SLAM named Orbeez-SLAM, which successfully collaborates with implicit neural representation and visual odometry to achieve our goals. Moreover, Orbeez-SLAM can work with the monocular camera since it only needs RGB inputs, making it widely applicable to the real world. Results show that our SLAM is up to 800x faster than the strong baseline with superior rendering outcomes. Code link: https://github.com/MarvinChung/Orbeez-SLAM.
翻译:能够通过视觉信号执行复杂任务并与人类协作的空间人工智能备受期待。为实现这一目标,我们需要一种无需预训练即可轻松适应新场景、并能实时生成稠密地图以支持下游任务的视觉SLAM。由于组件固有的局限性,先前的基于学习与非学习方法的视觉SLAM均无法满足所有需求。本文提出一种名为Orbeez-SLAM的视觉SLAM系统,成功将隐式神经表示与视觉里程计协同结合,实现了上述目标。此外,Orbeez-SLAM仅需RGB输入即可工作于单目摄像头,使其具有广泛的现实应用潜力。实验结果表明,我们的SLAM比强基线方法快800倍,同时具备优异的渲染效果。代码链接:https://github.com/MarvinChung/Orbeez-SLAM。