Previous dominant methods for scene flow estimation focus mainly on input from two consecutive frames, neglecting valuable information in the temporal domain. While recent trends shift towards multi-frame reasoning, they suffer from rapidly escalating computational costs as the number of frames grows. To leverage temporal information more efficiently, we propose DeltaFlow ($\Delta$Flow), a lightweight 3D framework that captures motion cues via a $\Delta$ scheme, extracting temporal features with minimal computational cost, regardless of the number of frames. Additionally, scene flow estimation faces challenges such as imbalanced object class distributions and motion inconsistency. To tackle these issues, we introduce a Category-Balanced Loss to enhance learning across underrepresented classes and an Instance Consistency Loss to enforce coherent object motion, improving flow accuracy. Extensive evaluations on the Argoverse 2, Waymo and nuScenes datasets show that $\Delta$Flow achieves state-of-the-art performance with up to 22% lower error and $2\times$ faster inference compared to the next-best multi-frame supervised method, while also demonstrating a strong cross-domain generalization ability. The code is open-sourced at https://github.com/Kin-Zhang/DeltaFlow along with trained model weights.
翻译:先前主导的场景流估计方法主要关注于连续两帧的输入,忽视了时间域中的宝贵信息。尽管近期趋势转向多帧推理,但随着帧数增加,这些方法面临计算成本急剧上升的问题。为更高效地利用时间信息,我们提出DeltaFlow($\Delta$Flow),一种轻量化的三维框架,通过$\Delta$方案捕捉运动线索,以极低计算成本提取时间特征,且不受帧数影响。此外,场景流估计面临物体类别分布不均衡和运动不一致等挑战。为解决这些问题,我们引入类别平衡损失以增强对代表性不足类别的学习,以及实例一致性损失以确保物体运动的连贯性,从而提升流估计精度。在Argoverse 2、Waymo和nuScenes数据集上的广泛评估表明,$\Delta$Flow实现了最先进的性能,与次优的多帧监督方法相比,误差降低高达22%,推理速度提升$2$倍,同时展现出强大的跨域泛化能力。代码及训练模型权重已开源:https://github.com/Kin-Zhang/DeltaFlow。