Simultaneous localization and mapping (SLAM) techniques can be used to navigate the visually impaired, but the development of robust SLAM solutions for crowded spaces is limited by the lack of realistic datasets. To address this, we introduce InCrowd-VI, a novel visual-inertial dataset specifically designed for human navigation in indoor pedestrian-rich environments. Recorded using Meta Aria Project glasses, it captures realistic scenarios without environmental control. InCrowd-VI features 58 sequences totaling a 5 km trajectory length and 1.5 hours of recording time, including RGB, stereo images, and IMU measurements. The dataset captures important challenges such as pedestrian occlusions, varying crowd densities, complex layouts, and lighting changes. Ground-truth trajectories, accurate to approximately 2 cm, are provided in the dataset, originating from the Meta Aria project machine perception SLAM service. In addition, a semi-dense 3D point cloud of scenes is provided for each sequence. The evaluation of state-of-the-art visual odometry (VO) and SLAM algorithms on InCrowd-VI revealed severe performance limitations in these realistic scenarios. Under challenging conditions, systems exceeded the required localization accuracy of 0.5 meters and the 1\% drift threshold, with classical methods showing drift up to 5-10\%. While deep learning-based approaches maintained high pose estimation coverage (>90\%), they failed to achieve real-time processing speeds necessary for walking pace navigation. These results demonstrate the need and value of a new dataset to advance SLAM research for visually impaired navigation in complex indoor environments. The dataset and associated tools are publicly available at https://incrowd-vi.cloudlab.zhaw.ch/.
翻译:同步定位与建图(SLAM)技术可用于视障人士导航,但针对拥挤空间的鲁棒SLAM解决方案开发受限于缺乏真实数据集。为此,我们推出了InCrowd-VI——一个专为室内行人密集环境中人类导航设计的新型视觉-惯性数据集。该数据集使用Meta Aria Project眼镜录制,在无环境控制的条件下捕捉真实场景。InCrowd-VI包含58个序列,总轨迹长度达5公里,录制时长达1.5小时,涵盖RGB图像、立体图像及IMU测量数据。数据集捕获了行人遮挡、动态人群密度、复杂空间布局和光照变化等重要挑战。数据集提供精度约2厘米的真实轨迹,源自Meta Aria项目的机器感知SLAM服务。此外,每个序列均提供场景的半稠密三维点云。在InCrowd-VI上对前沿视觉里程计(VO)和SLAM算法的评估揭示了这些算法在真实场景中的严重性能局限:在挑战性条件下,系统定位误差超过0.5米的精度要求及1%的漂移阈值,传统方法漂移率高达5-10%;基于深度学习的方法虽保持较高的位姿估计覆盖率(>90%),但未能达到步行导航所需的实时处理速度。这些结果证明了新数据集对于推动复杂室内环境中视障导航SLAM研究的必要性与价值。数据集及相关工具已公开于https://incrowd-vi.cloudlab.zhaw.ch/。