Urban sidewalk navigation presents significant challenges due to complex structural layouts, dynamic pedestrian behaviors, and long distances. While recent visual navigation models offer a promising solution, the lack of a unified benchmark hinders quantitative and reproducible evaluation. To bridge this gap, we propose SidewalkBench, a comprehensive benchmark designed for visual navigation on urban sidewalks. Built upon NVIDIA Isaac Sim, SidewalkBench brings GPU-accelerated simulation of diverse, high-fidelity sidewalk environments, including both procedurally generated and real-world scanned scenes. We further populate the scenes with rich, reactive event-based pedestrian behaviors and flexible, efficient animation, enabling standardized model evaluation under realistic real-world settings. We conduct a comprehensive evaluation of 9 visual navigation models on 330 unit-test scenarios, 800 pedestrian-reactive scenarios, and 105 long-horizon scenarios. Our findings highlight that pedestrian interaction and long-horizon robustness remain critical bottlenecks for existing models, and scaling up sidewalk training with synthetic data emerges as a promising solution.
翻译:城市人行道导航面临显著挑战,包括复杂的结构布局、动态的行人行为以及远距离导航需求。尽管近期视觉导航模型提供了有前景的解决方案,但缺乏统一基准阻碍了量化评估与可重复性研究。为填补这一空白,我们提出SidewalkBench——一个专为城市人行道视觉导航设计的综合基准框架。该基准基于NVIDIA Isaac Sim构建,支持GPU加速的高保真度多样化人行道环境模拟,涵盖程序化生成场景与真实世界扫描场景。我们进一步为场景注入基于事件的丰富响应式行人行为模型与灵活高效的动画系统,使模型能在逼真的现实场景下进行标准化评估。我们在330个单元测试场景、800个人行响应测试场景及105个长时域测试场景上系统评估了9种视觉导航模型。实验结果表明,行人交互处理与长时域鲁棒性仍是现有模型的关键瓶颈,而利用合成数据扩展人行道训练数据成为有前景的解决方案。