Simultaneous localization and mapping (SLAM) techniques can be used to navigate the visually impaired, but the development of robust SLAM solutions for crowded spaces is limited by the lack of realistic datasets. To address this, we introduce InCrowd-VI, a novel visual-inertial dataset specifically designed for human navigation in indoor pedestrian-rich environments. Recorded using Meta Aria Project glasses, it captures realistic scenarios without environmental control. InCrowd-VI features 58 sequences totaling a 5 km trajectory length and 1.5 hours of recording time, including RGB, stereo images, and IMU measurements. The dataset captures important challenges such as pedestrian occlusions, varying crowd densities, complex layouts, and lighting changes. Ground-truth trajectories, accurate to approximately 2 cm, are provided in the dataset, originating from the Meta Aria project machine perception SLAM service. In addition, a semi-dense 3D point cloud of scenes is provided for each sequence. The evaluation of state-of-the-art visual odometry (VO) and SLAM algorithms on InCrowd-VI revealed severe performance limitations in these realistic scenarios, demonstrating the need and value of the new dataset to advance SLAM research for visually impaired navigation in complex indoor environments.
翻译:同步定位与建图(SLAM)技术可用于视障人士导航,但针对拥挤空间开发鲁棒的SLAM解决方案受限于缺乏真实数据集。为此,我们推出了InCrowd-VI——一个专为室内行人密集环境中人类导航设计的新型视觉-惯性数据集。该数据集使用Meta Aria Project眼镜录制,在无环境控制条件下捕捉真实场景。InCrowd-VI包含58个序列,总轨迹长度达5公里,录制时长达1.5小时,涵盖RGB图像、立体图像及惯性测量单元(IMU)数据。数据集捕捉了行人遮挡、动态人群密度、复杂空间布局及光照变化等关键挑战。数据集提供精度约2厘米的真实轨迹,其源自Meta Aria项目的机器感知SLAM服务。此外,每个序列均提供场景的半稠密三维点云。通过在InCrowd-VI上评估当前最先进的视觉里程计(VO)与SLAM算法,发现这些算法在真实场景中存在严重的性能局限,这证明了新数据集对于推动复杂室内环境中视障导航SLAM研究的需求与价值。