Imitation learning provides a powerful framework for goal-conditioned visual navigation in mobile robots, enabling obstacle avoidance while respecting human preferences and social norms. However, its effectiveness depends critically on the quality and diversity of training data. In this work, we show how classical geometric planners can be leveraged to generate synthetic trajectories that complement costly human demonstrations. We train Less is More (LiMo), a transformer-based visual navigation policy that predicts goal-conditioned SE(2) trajectories from a single RGB observation, and find that augmenting limited expert demonstrations with planner-generated supervision yields substantial performance gains. Through ablations and complementary qualitative and quantitative analyses, we characterize how dataset scale and diversity affect planning performance. We demonstrate real-robot deployment and argue that robust visual navigation is enabled not by simply collecting more demonstrations, but by strategically curating diverse, high-quality datasets. Our results suggest that scalable, embodiment-specific geometric supervision is a practical path toward data-efficient visual navigation.
翻译:模仿学习为移动机器人的目标条件视觉导航提供了一个强大的框架,使其能够在避障的同时尊重人类偏好和社会规范。然而,其有效性在很大程度上取决于训练数据的质量和多样性。在本工作中,我们展示了如何利用经典几何规划器生成合成轨迹,以补充成本高昂的人类演示数据。我们训练了Less is More (LiMo),这是一种基于Transformer的视觉导航策略,能够从单张RGB观测图像预测目标条件的SE(2)轨迹。研究发现,将有限的专家演示数据与规划器生成的监督信号相结合,能带来显著的性能提升。通过消融实验以及互补的定性与定量分析,我们揭示了数据集规模和多样性如何影响规划性能。我们展示了在真实机器人上的部署,并论证了鲁棒的视觉导航并非简单地通过收集更多演示数据实现,而是依赖于策略性地构建多样化、高质量的数据集。我们的结果表明,可扩展的、具身化的几何监督是实现数据高效视觉导航的一条实用路径。