We present Online3R, a new sequential reconstruction framework that is capable of adapting to new scenes through online learning, effectively resolving inconsistency issues. Specifically, we introduce a set of learnable lightweight visual prompts into a pretrained, frozen geometry foundation model to capture the knowledge of new environments while preserving the fundamental capability of the foundation model for geometry prediction. To solve the problems of missing groundtruth and the requirement of high efficiency when updating these visual prompts at test time, we introduce a local-global self-supervised learning strategy by enforcing the local and global consistency constraints on predictions. The local consistency constraints are conducted on intermediate and previously local fused results, enabling the model to be trained with high-quality pseudo groundtruth signals; the global consistency constraints are operated on sparse keyframes spanning long distances rather than per frame, allowing the model to learn from a consistent prediction over a long trajectory in an efficient way. Our experiments demonstrate that Online3R outperforms previous state-of-the-art methods on various benchmarks. Project page: https://shunkaizhou.github.io/online3r-1.0/
翻译:我们提出Online3R,一种新型序列重建框架,能够通过在线学习适应新场景,有效解决重建不一致性问题。具体而言,我们在预训练的冻结几何基础模型中引入一组可学习的轻量级视觉提示,在保持基础模型几何预测核心能力的同时,捕获新环境的知识。为解决测试时更新这些视觉提示缺乏真值信号且需高效性的问题,我们提出一种局部-全局自监督学习策略,对预测结果施加局部与全局一致性约束。局部一致性约束作用于中间结果与先前局部融合结果,使模型能利用高质量伪真值信号进行训练;全局一致性约束作用于跨越长距离的稀疏关键帧而非逐帧处理,使模型能以高效方式从长轨迹的一致预测中学习。实验表明,Online3R在多种基准测试中均超越现有最佳方法。项目页面:https://shunkaizhou.github.io/online3r-1.0/