Neural Scene Flow Prior (NSFP) is of significant interest to the vision community due to its inherent robustness to out-of-distribution (OOD) effects and its ability to deal with dense lidar points. The approach utilizes a coordinate neural network to estimate scene flow at runtime, without any training. However, it is up to 100 times slower than current state-of-the-art learning methods. In other applications such as image, video, and radiance function reconstruction innovations in speeding up the runtime performance of coordinate networks have centered upon architectural changes. In this paper, we demonstrate that scene flow is different -- with the dominant computational bottleneck stemming from the loss function itself (i.e., Chamfer distance). Further, we rediscover the distance transform (DT) as an efficient, correspondence-free loss function that dramatically speeds up the runtime optimization. Our fast neural scene flow (FNSF) approach reports for the first time real-time performance comparable to learning methods, without any training or OOD bias on two of the largest open autonomous driving (AV) lidar datasets Waymo Open and Argoverse.
翻译:神经场景流先验(NSFP)因其对分布外(OOD)效应的固有鲁棒性以及处理密集激光雷达点的能力,引起了视觉界的极大兴趣。该方法利用坐标神经网络在运行时估计场景流,无需任何训练。然而,其速度比当前最先进的深度学习方法慢高达100倍。在图像、视频和辐射函数重建等其他应用中,加速坐标网络运行时性能的创新主要集中在架构改动上。本文证明,场景流有所不同——其主要计算瓶颈源于损失函数本身(即Chamfer距离)。此外,我们发现距离变换(DT)是一种高效、无需对应点的损失函数,它显著加快了运行时的优化速度。我们的快速神经场景流(FNSF)方法在两个最大的开放自动驾驶(AV)激光雷达数据集Waymo Open和Argoverse上,首次实现了与深度学习方法相当的实时性能,且无需任何训练或分布外偏差。