Recent advances in deep monocular visual Simultaneous Localization and Mapping (SLAM) have achieved impressive accuracy and dense reconstruction capabilities, yet their robustness to scale inconsistency in large-scale indoor environments remains largely unexplored. Existing benchmarks are limited to room-scale or structurally simple settings, leaving critical issues of intra-session scale drift and inter-session scale ambiguity insufficiently addressed. To fill this gap, we introduce the ScaleMaster Dataset, the first benchmark explicitly designed to evaluate scale consistency under challenging scenarios such as multi-floor structures, long trajectories, repetitive views, and low-texture regions. We systematically analyze the vulnerability of state-of-the-art deep monocular visual SLAM systems to scale inconsistency, providing both quantitative and qualitative evaluations. Crucially, our analysis extends beyond traditional trajectory metrics to include a direct map-to-map quality assessment using metrics like Chamfer distance against high-fidelity 3D ground truth. Our results reveal that while recent deep monocular visual SLAM systems demonstrate strong performance on existing benchmarks, they suffer from severe scale-related failures in realistic, large-scale indoor environments. By releasing the ScaleMaster dataset and baseline results, we aim to establish a foundation for future research toward developing scale-consistent and reliable visual SLAM systems.
翻译:深度单目视觉同时定位与建图(SLAM)技术的最新进展已实现令人瞩目的精度与稠密重建能力,然而其在大规模室内环境中应对尺度不一致性的鲁棒性仍亟待探索。现有基准测试局限于房间尺度或结构简单的场景,未能充分解决会话内尺度漂移与会话间尺度模糊性等关键问题。为填补这一空白,我们提出了ScaleMaster数据集——首个专门用于评估多楼层结构、长轨迹、重复视角及低纹理区域等挑战性场景下尺度一致性的基准测试。我们系统分析了当前最先进的深度单目视觉SLAM系统对尺度不一致性的脆弱性,并提供定量与定性评估。尤为关键的是,我们的分析超越了传统轨迹评估指标,通过引入倒角距离等度量方式,实现了基于高精度三维真值的直接地图质量评估。实验结果表明,尽管当前深度单目视觉SLAM系统在现有基准测试中表现优异,但在真实大规模室内环境中仍存在严重的尺度相关失效问题。通过开源ScaleMaster数据集及基线结果,我们旨在为未来开发尺度一致且可靠的视觉SLAM系统奠定研究基础。