Scene flow estimation is a long-standing problem in computer vision, where the goal is to find the 3D motion of a scene from its consecutive observations. Recently, there have been efforts to compute the scene flow from 3D point clouds. A common approach is to train a regression model that consumes source and target point clouds and outputs the per-point translation vector. An alternative is to learn point matches between the point clouds concurrently with regressing a refinement of the initial correspondence flow. In both cases, the learning task is very challenging since the flow regression is done in the free 3D space, and a typical solution is to resort to a large annotated synthetic dataset. We introduce SCOOP, a new method for scene flow estimation that can be learned on a small amount of data without employing ground-truth flow supervision. In contrast to previous work, we train a pure correspondence model focused on learning point feature representation and initialize the flow as the difference between a source point and its softly corresponding target point. Then, in the run-time phase, we directly optimize a flow refinement component with a self-supervised objective, which leads to a coherent and accurate flow field between the point clouds. Experiments on widespread datasets demonstrate the performance gains achieved by our method compared to existing leading techniques while using a fraction of the training data. Our code is publicly available at https://github.com/itailang/SCOOP.
翻译:场景流估计是计算机视觉中的经典问题,其目标是根据三维场景的连续观测数据获取其三维运动。近年来,已有研究致力于从三维点云计算场景流。常见方法是训练一个回归模型,该模型接收源点云与目标点云,输出逐点平移向量。另一种替代方案是同时学习点云间的点匹配并回归初始对应流的精细化结果。在这两种方案中,由于流回归在自由三维空间中进行,学习任务极具挑战性,典型解决方案是借助大规模带标注的合成数据集。我们提出SCOOP——一种无需使用真实流监督即可通过少量数据学习的场景流估计新方法。与先前工作不同,我们训练了一个专注于学习点特征表示的纯对应模型,并将流初始化为源点与其软对应目标点之间的差值。随后在运行时阶段,我们采用自监督目标直接优化流精细化组件,从而在点云间生成连续且精确的流场。在广泛数据集上的实验表明,与现有领先技术相比,本方法在使用更少训练数据的情况下仍能获得性能提升。我们的代码开源在https://github.com/itailang/SCOOP。