Precise 6-DoF simultaneous localization and mapping (SLAM) from onboard sensors is critical for wearable devices capturing egocentric data, which exhibits specific challenges, such as a wider diversity of motions and viewpoints, prevalent dynamic visual content, or long sessions affected by time-varying sensor calibration. While recent progress on SLAM has been swift, academic research is still driven by benchmarks that do not reflect these challenges or do not offer sufficiently accurate ground truth poses. In this paper, we introduce a new dataset and benchmark for visual-inertial SLAM with egocentric, multi-modal data. We record hours and kilometers of trajectories through a city center with glasses-like devices equipped with various sensors. We leverage surveying tools to obtain control points as indirect pose annotations that are metric, centimeter-accurate, and available at city scale. This makes it possible to evaluate extreme trajectories that involve walking at night or traveling in a vehicle. We show that state-of-the-art systems developed by academia are not robust to these challenges and we identify components that are responsible for this. In addition, we design tracks with different levels of difficulty to ease in-depth analysis and evaluation of less mature approaches. The dataset and benchmark are available at https://www.lamaria.ethz.ch.
翻译:利用机载传感器实现精确的六自由度同步定位与建图(SLAM)对于采集自我中心数据的可穿戴设备至关重要,这类数据面临特定挑战,例如更广泛的运动与视角多样性、普遍存在的动态视觉内容,或受时变传感器校准影响的长时程会话。尽管SLAM领域近期进展迅速,学术研究仍受限于未能反映这些挑战或无法提供足够精确真实位姿的基准数据集。本文提出一个面向自我中心多模态数据的视觉-惯性SLAM新型数据集与基准测试。我们通过配备多种传感器的类眼镜设备,在城市中心记录了长达数小时、覆盖数公里的轨迹。利用测绘工具获取作为间接位姿标注的控制点,这些标注具有公制尺度、厘米级精度,并可在城市尺度上实现。这使得评估极端轨迹(如夜间步行或车载行进)成为可能。我们证明学术界开发的最新系统无法稳健应对这些挑战,并识别了导致该现象的关键组件。此外,我们设计了不同难度等级的测试轨道,以促进对尚不成熟方法的深入分析与评估。数据集与基准测试可通过https://www.lamaria.ethz.ch获取。