Popular benchmarks for self-supervised LiDAR scene flow (stereoKITTI, and FlyingThings3D) have unrealistic rates of dynamic motion, unrealistic correspondences, and unrealistic sampling patterns. As a result, progress on these benchmarks is misleading and may cause researchers to focus on the wrong problems. We evaluate a suite of top methods on a suite of real-world datasets (Argoverse 2.0, Waymo, and NuScenes) and report several conclusions. First, we find that performance on stereoKITTI is negatively correlated with performance on real-world data. Second, we find that one of this task's key components -- removing the dominant ego-motion -- is better solved by classic ICP than any tested method. Finally, we show that despite the emphasis placed on learning, most performance gains are caused by pre- and post-processing steps: piecewise-rigid refinement and ground removal. We demonstrate this through a baseline method that combines these processing steps with a learning-free test-time flow optimization. This baseline outperforms every evaluated method.
翻译:用于自监督激光雷达场景流的流行基准(stereoKITTI和FlyingThings3D)存在动态运动比率不切实际、对应关系不真实以及采样模式不合理的问题。因此,这些基准上的进展具有误导性,可能使研究人员聚焦于错误的问题。我们在多个真实世界数据集(Argoverse 2.0、Waymo和NuScenes)上评估了一系列顶尖方法,并得出若干结论。首先,我们发现stereoKITTI上的性能与真实世界数据上的性能呈负相关。其次,我们发现这一任务的关键组成部分——消除主导的自车运动——通过经典ICP方法求解的效果优于任何测试过的方法。最后,我们表明,尽管学界强调学习过程,但大多数性能提升源于预处理和后处理步骤:分段刚体细化与地面移除。我们通过一种基线方法证明了这一点,该方法将这些处理步骤与无学习的测试时流优化相结合。该基线的性能超过了所有被评估的方法。