Streaming recurrent models enable efficient 3D reconstruction by maintaining persistent state representations. However, they suffer from catastrophic forgetting over long sequences due to balancing historical information with new observations. Recent methods alleviate this by deriving adaptive signals from attention perspective, but they operate on single dimensions without considering temporal and spatial consistency. To this end, we propose a training-free framework termed TTSA3R that leverages both temporal state evolution and spatial observation quality for adaptive state updates in 3D reconstruction. In particular, we devise a Temporal Adaptive Update Module that regulates update magnitude by analyzing temporal state evolution patterns. Then, a Spatial Contextual Update Module is introduced to localize spatial regions that require updates through observation-state alignment and scene dynamics. These complementary signals are finally fused to determine the state updating strategies. Extensive experiments demonstrate the effectiveness of TTSA3R in diverse 3D tasks. Moreover, our method exhibits only 1.33x error increase compared to over 4x degradation in the baseline model on extended sequences of 3D reconstruction, significantly improving long-term reconstruction stability. Our codes are available at https://github.com/anonus2357/ttsa3r.
翻译:流式循环模型通过维护持久状态表征来实现高效的三维重建。然而,由于需要在历史信息与新观测之间进行权衡,这些模型在长序列上会遭受灾难性遗忘。近期方法通过从注意力视角推导自适应信号来缓解此问题,但它们仅在单一维度上操作,未考虑时间与空间一致性。为此,我们提出了一种名为TTSA3R的无训练框架,该框架同时利用时间状态演化与空间观测质量来实现三维重建中的自适应状态更新。具体而言,我们设计了一个时间自适应更新模块,通过分析时间状态演化模式来调节更新幅度。随后,引入一个空间上下文更新模块,通过观测-状态对齐与场景动态来定位需要更新的空间区域。这些互补信号最终被融合以确定状态更新策略。大量实验证明了TTSA3R在多种三维任务中的有效性。此外,在三维重建的扩展序列上,我们的方法仅产生1.33倍的误差增长,而基线模型的误差增长超过4倍,显著提升了长期重建的稳定性。我们的代码可在 https://github.com/anonus2357/ttsa3r 获取。