Long-horizon online visual mapping is a core capability for robot perception, requiring continuous camera-motion and scene-geometry estimation from visual streams under bounded memory and computation. Recent feed-forward 3D reconstruction models provide strong geometric priors, but their streaming variants often predict poses in a fixed coordinate system tied to the first frame or a persistent scene memory. This fixed-gauge design leads to train--test mismatch, attention bias toward early anchors, and accumulated drift on sequences much longer than those seen during training. We propose \emph{Anchor3R}, a streaming 3D reconstruction framework that treats feed-forward reconstruction as current-centric local measurement prediction rather than persistent global-gauge regression. At each time step, Anchor3R predicts window-relative poses and a local pointmap in the current-frame coordinate system, turning streaming reconstruction into relative-pose measurement generation. These measurements support online pose updates, while loop-closure reinsertion and motion averaging align the trajectory and transform local pointmaps into a coherent global reconstruction. Experiments on indoor, outdoor, driving, and RGB-D benchmarks show that Anchor3R improves long-horizon pose accuracy and dense reconstruction quality over existing streaming baselines, while supporting bounded-memory online inference.
翻译:长期在线视觉映射是机器人感知的核心能力,要求在有限内存和计算资源下,通过视觉流持续估计相机运动与场景几何结构。现有前馈式三维重建模型虽能提供强几何先验,但其流式变体通常以首帧或持久场景记忆所绑定的固定坐标系预测位姿。这种固定尺度设计会导致训练—测试失配、对早期锚点的注意力偏置,以及在远超训练序列长度的长序列上产生累积漂移。本文提出Anchor3R——一种流式三维重建框架,将前馈式重建视为以当前帧为中心的局部测量预测,而非持久全局尺度的回归。在每个时间步,Anchor3R预测窗口相对位姿及当前帧坐标系下的局部点图,将流式重建转化为相对位姿测量生成。这些测量支持在线位姿更新,同时通过闭环重插入与运动平均对齐轨迹,并将局部点图变换为连贯的全局重建结果。在室内、室外、驾驶及RGB-D基准数据集上的实验表明,与现有流式基线相比,Anchor3R在提升长期位姿精度与密集重建质量的同时,支持有界内存的在线推理。