Generalization of imitation-learned navigation policies to environments unseen in training remains a major challenge. We address this by conducting the first large-scale study of how data quantity and data diversity affect real-world generalization in end-to-end, map-free visual navigation. Using a curated 4,565-hour crowd-sourced dataset collected across 161 locations in 35 countries, we train policies for point goal navigation and evaluate their closed-loop control performance on sidewalk robots operating in four countries, covering 125 km of autonomous driving. Our results show that large-scale training data enables zero-shot navigation in unknown environments, approaching the performance of policies trained with environment-specific demonstrations. Critically, we find that data diversity is far more important than data quantity. Doubling the number of geographical locations in a training set decreases navigation errors by ~15%, while performance benefit from adding data from existing locations saturates with very little data. We also observe that, with noisy crowd-sourced data, simple regression-based models outperform generative and sequence-based architectures. We release our policies, evaluation setup and example videos on the project page.
翻译:在模仿学习导航策略向训练中未见环境的泛化方面,仍存在重大挑战。我们通过首次大规模研究数据数量与数据多样性如何影响端到端、无地图视觉导航在现实世界中的泛化能力来解决这一问题。利用在35个国家161个地点收集的4,565小时精选众包数据集,我们训练了点目标导航策略,并在四个国家运行的人行道机器人上评估其闭环控制性能,涵盖125公里的自动驾驶里程。研究结果表明,大规模训练数据能够实现未知环境中的零样本导航,其性能接近使用环境特定演示数据训练的策略。关键发现是,数据多样性远比数据数量更为重要。将训练集中地理位置数量增加一倍可使导航误差降低约15%,而从现有位置添加数据带来的性能提升在数据量极少时即趋于饱和。我们还观察到,在使用含噪声的众包数据时,基于简单回归的模型性能优于生成式架构和基于序列的架构。我们已在项目页面发布策略、评估设置及示例视频。