Simultaneous Localization and Mapping (SLAM) and Multi-Object Tracking (MOT) are pivotal tasks in the realm of autonomous driving, attracting considerable research attention. While SLAM endeavors to generate real-time maps and determine the vehicle's pose in unfamiliar settings, MOT focuses on the real-time identification and tracking of multiple dynamic objects. Despite their importance, the prevalent approach treats SLAM and MOT as independent modules within an autonomous vehicle system, leading to inherent limitations. Classical SLAM methodologies often rely on a static environment assumption, suitable for indoor rather than dynamic outdoor scenarios. Conversely, conventional MOT techniques typically rely on the vehicle's known state, constraining the accuracy of object state estimations based on this prior. To address these challenges, previous efforts introduced the unified SLAMMOT paradigm, yet primarily focused on simplistic motion patterns. In our team's previous work IMM-SLAMMOT\cite{IMM-SLAMMOT}, we present a novel methodology incorporating consideration of multiple motion models into SLAMMOT i.e. tightly coupled SLAM and MOT, demonstrating its efficacy in LiDAR-based systems. This paper studies feasibility and advantages of instantiating this methodology as visual SLAMMOT, bridging the gap between LiDAR and vision-based sensing mechanisms. Specifically, we propose a solution of visual SLAMMOT considering multiple motion models and validate the inherent advantages of IMM-SLAMMOT in the visual domain.
翻译:同步定位与建图(SLAM)和多目标跟踪(MOT)是自动驾驶领域的关键任务,吸引了大量研究关注。SLAM致力于在未知环境中生成实时地图并确定车辆位姿,而MOT则专注于对多个动态物体进行实时识别与跟踪。尽管二者至关重要,当前主流方法仍将SLAM与MOT视为自动驾驶系统中相互独立的模块,这导致了固有的局限性。经典SLAM方法通常基于静态环境假设,适用于室内而非动态的室外场景。相反,传统MOT技术通常依赖于已知的车辆状态,基于此先验信息约束目标状态估计的精度。为应对这些挑战,先前研究引入了统一的SLAMMOT范式,但主要聚焦于简单运动模式。在我们团队先前的工作IMM-SLAMMOT\cite{IMM-SLAMMOT}中,我们提出了一种将多运动模型考量融入SLAMMOT(即紧耦合的SLAM与MOT)的新方法,并证明了其在基于激光雷达系统中的有效性。本文研究了将该方法实例化为视觉SLAMMOT的可行性与优势,从而弥合激光雷达与视觉感知机制间的差距。具体而言,我们提出了一种考虑多运动模型的视觉SLAMMOT解决方案,并验证了IMM-SLAMMOT在视觉领域的固有优势。