CostNav: A Navigation Benchmark for Real-World Economic-Cost Evaluation of Physical AI Agents

Haebin Seong,Sungmin Kim,Yongjun Cho,Myunchul Joe,Geunwoo Kim,Yubeen Park,Sunhoo Kim,Yoonshik Kim,Suhwan Choi,Jaeyoon Jung,Jiyong Youn,Jinmyung Kwak,Sunghee Ahn,Jaemin Lee,Younggil Do,Seungyeop Yi,Woojin Cheong,Minhyeok Oh,Minchan Kim,Seongjae Kang,Samwoo Seong,Youngjae Yu,Yunsung Lee

While current navigation benchmarks prioritize task success in simplified settings, they neglect the multidimensional economic constraints essential for the real-world commercialization of autonomous delivery systems. We introduce CostNav, an Economic Navigation Benchmark that evaluates physical AI agents through comprehensive economic cost-revenue analysis aligned with real-world business operations. By integrating industry-standard data--such as Securities and Exchange Commission (SEC) filings and Abbreviated Injury Scale (AIS) injury reports--with Isaac Sim's detailed collision and cargo dynamics, CostNav transcends simple task completion to accurately evaluate business value in complex, real-world scenarios. To our knowledge, CostNav is the first physics-grounded economic benchmark that uses industry-standard regulatory and financial data to quantitatively expose the gap between navigation research metrics and commercial viability, revealing that optimizing for task success on a simplified task fundamentally differs from optimizing for real-world economic deployment. Evaluating seven baselines--two rule-based and five imitation learning--we find that no current method is economically viable, all yielding negative contribution margins. The best-performing method, CANVAS (-27.36\$/run), equipped with only an RGB camera and GPS, outperforms LiDAR-equipped Nav2 w/ GPS (-35.46\$/run). We challenge the community to develop navigation policies that achieve economic viability on CostNav. We remain method-agnostic, evaluating success solely on cost rather than the underlying architecture. All resources are available at https://github.com/worv-ai/CostNav.

翻译：当前导航基准主要关注简化环境下的任务成功率，却忽视了自主配送系统实现现实世界商业化所必需的多维经济约束。我们提出CostNav——一个经济导航基准，通过对齐现实商业运营的全面经济成本-收益分析来评估物理AI智能体。通过整合行业标准数据（如美国证券交易委员会备案文件和简明损伤定级标准伤害报告）与Isaac Sim精细的碰撞及货物动力学模拟，CostNav超越了简单的任务完成度评估，能够精准衡量复杂现实场景中的商业价值。据我们所知，CostNav是首个基于物理模拟的经济基准，其利用行业标准监管与财务数据定量揭示了导航研究指标与商业可行性之间的差距，并证明在简化任务上优化成功率与优化现实经济部署存在本质区别。通过对七种基线方法（两种基于规则的方法和五种模仿学习方法）的评估，我们发现现有方法均未达到经济可行性，全部产生负边际贡献。表现最佳的方法CANVAS（-27.36美元/次运行）仅配备RGB摄像头和GPS，其表现优于配备激光雷达与GPS的Nav2系统（-35.46美元/次运行）。我们呼吁学界开发能在CostNav上实现经济可行性的导航策略。本基准保持方法无关性，仅依据成本而非底层架构评估性能。所有资源已发布于https://github.com/worv-ai/CostNav。