Multimotion Visual Odometry (MVO)

from arxiv, Under review for the International Journal of Robotics Research (IJRR), Manuscript #IJR-21-4311. 25 pages, 14 figures, 11 tables. Videos available at https://www.youtube.com/watch?v=mNj3s1nf-6A and https://www.youtube.com/playlist?list=PLbaQBz4TuPcxMIXKh5Q80s0N9ISezFcpi

Visual motion estimation is a well-studied challenge in autonomous navigation. Recent work has focused on addressing multimotion estimation in highly dynamic environments. These environments not only comprise multiple, complex motions but also tend to exhibit significant occlusion. Estimating third-party motions simultaneously with the sensor egomotion is difficult because an object's observed motion consists of both its true motion and the sensor motion. Most previous works in multimotion estimation simplify this problem by relying on appearance-based object detection or application-specific motion constraints. These approaches are effective in specific applications and environments but do not generalize well to the full multimotion estimation problem (MEP). This paper presents Multimotion Visual Odometry (MVO), a multimotion estimation pipeline that estimates the full SE(3) trajectory of every motion in the scene, including the sensor egomotion, without relying on appearance-based information. MVO extends the traditional visual odometry (VO) pipeline with multimotion segmentation and tracking techniques. It uses physically founded motion priors to extrapolate motions through temporary occlusions and identify the reappearance of motions through motion closure. Evaluations on real-world data from the Oxford Multimotion Dataset (OMD) and the KITTI Vision Benchmark Suite demonstrate that MVO achieves good estimation accuracy compared to similar approaches and is applicable to a variety of multimotion estimation challenges.

翻译：视觉运动估计是自主导航领域中一个经过充分研究的挑战。最近的研究聚焦于在高度动态环境中解决多运动估计问题。这类环境不仅包含多种复杂运动，还往往表现出显著的遮挡现象。在估计传感器自身运动的同时，对第三方运动进行同步估计十分困难，因为观测到的物体运动由其真实运动与传感器运动共同构成。以往大多数多运动估计方法依赖基于外观的目标检测或特定应用的运动约束来简化该问题。这些方法在特定应用和环境中有效，但难以泛化至完整的运动估计问题（MEP）。本文提出多运动视觉里程计（MVO），它是一个无需依赖外观信息即可估计场景中每个运动（包括传感器自身运动）完整SE(3)轨迹的多运动估计流程。MVO通过引入多运动分割与跟踪技术，扩展了传统视觉里程计（VO）流程。该方法利用基于物理原理的运动先验，通过暂时遮挡外推运动轨迹，并通过运动闭合识别运动的重新出现。基于牛津多运动数据集（OMD）和KITTI视觉基准测试套件的真实数据评估表明，MVO在估计精度上与同类方法相比表现良好，且适用于多种多运动估计挑战。