While current navigation benchmarks prioritize task success in simplified settings, they neglect the multidimensional economic constraints essential for the real-world commercialization of autonomous delivery systems. We introduce CostNav, an Economic Navigation Benchmark that evaluates physical AI agents through comprehensive economic cost-revenue analysis aligned with real-world business operations. By integrating industry-standard data - such as SEC filings and AIS injury reports - with Isaac Sim's detailed collision and cargo dynamics, CostNav transcends simple task completion to accurately evaluate business value in complex, real-world scenarios. To our knowledge, CostNav is the first work to quantitatively expose the gap between navigation research metrics and commercial viability, revealing that optimizing for task success on a simplified task fundamentally differs from optimizing for real-world economic deployment. Our evaluation of rule-based Nav2 navigation shows that current approaches are not economically viable: the contribution margin is -22.81/run (AMCL) and -12.87/run (GPS), resulting in no break-even point. We challenge the community to develop navigation policies that achieve economic viability on CostNav. We remain method-agnostic, evaluating success solely on the metric of cost rather than the underlying architecture. All resources are available at https://github.com/worv-ai/CostNav.
翻译:当前导航基准主要关注简化环境下的任务成功率,却忽视了自主配送系统实现现实世界商业化所必需的多维经济约束。我们提出了CostNav——一个经济导航基准,通过对齐现实商业运营的全面经济成本-收益分析来评估物理AI智能体。通过整合行业标准数据(如SEC财报和AIS工伤报告)与Isaac Sim精细的碰撞及货物动力学模拟,CostNav超越了简单的任务完成度评估,能够精准衡量复杂现实场景中的商业价值。据我们所知,CostNav是首个定量揭示导航研究指标与商业可行性之间差距的工作,研究表明:针对简化任务的任务成功率优化与面向现实世界经济部署的优化存在根本性差异。我们对基于规则的Nav2导航系统的评估表明,当前方法在经济上不可行:边际贡献为-22.81美元/次(AMCL)和-12.87美元/次(GPS),无法达到盈亏平衡点。我们呼吁学界开发能在CostNav上实现经济可行性的导航策略。本基准保持方法无关性,仅依据成本指标而非底层架构评估性能。所有资源已发布于https://github.com/worv-ai/CostNav。