Self-supervised feed-forward methods for scene flow estimation offer real-time efficiency, but their supervision from two-frame point correspondences is unreliable and often breaks down under occlusions. Multi-frame supervision has the potential to provide more stable guidance by incorporating motion cues from past frames, yet naive extensions of two-frame objectives are ineffective because point correspondences vary abruptly across frames, producing inconsistent signals. In the paper, we present TeFlow, enabling multi-frame supervision for feed-forward models by mining temporally consistent supervision. TeFlow introduces a temporal ensembling strategy that forms reliable supervisory signals by aggregating the most temporally consistent motion cues from a candidate pool built across multiple frames. Extensive evaluations demonstrate that TeFlow establishes a new state-of-the-art for self-supervised feed-forward methods, achieving performance gains of up to 33\% on the challenging Argoverse 2 and nuScenes datasets. Our method performs on par with leading optimization-based methods, yet speeds up 150 times. The code is open-sourced at https://github.com/KTH-RPL/OpenSceneFlow along with trained model weights.
翻译:自监督前馈场景流估计方法虽能实现实时效率,但其基于两帧点对应的监督信号并不可靠,且在遮挡情况下常失效。多帧监督通过融合历史帧的运动线索,有望提供更稳定的引导,但将两帧目标函数简单扩展至多帧效果不佳,因为点对应关系在帧间会发生突变,从而产生不一致的信号。本文提出TeFlow,通过挖掘时间一致的监督信号,为前馈模型实现多帧监督。TeFlow引入一种时序集成策略,通过从多帧构建的候选池中聚合最具时间一致性的运动线索,形成可靠的监督信号。大量实验表明,TeFlow为自监督前馈方法确立了新的性能标杆,在极具挑战性的Argoverse 2和nuScenes数据集上实现了高达33%的性能提升。本方法性能与领先的基于优化的方法相当,但速度提升150倍。代码及训练模型权重已开源:https://github.com/KTH-RPL/OpenSceneFlow。