Learning provides a powerful tool for vision-based navigation, but the capabilities of learning-based policies are constrained by limited training data. If we could combine data from all available sources, including multiple kinds of robots, we could train more powerful navigation models. In this paper, we study how a general goal-conditioned model for vision-based navigation can be trained on data obtained from many distinct but structurally similar robots, and enable broad generalization across environments and embodiments. We analyze the necessary design decisions for effective data sharing across robots, including the use of temporal context and standardized action spaces, and demonstrate that an omnipolicy trained from heterogeneous datasets outperforms policies trained on any single dataset. We curate 60 hours of navigation trajectories from 6 distinct robots, and deploy the trained GNM on a range of new robots, including an underactuated quadrotor. We find that training on diverse data leads to robustness against degradation in sensing and actuation. Using a pre-trained navigation model with broad generalization capabilities can bootstrap applications on novel robots going forward, and we hope that the GNM represents a step in that direction. For more information on the datasets, code, and videos, please check out our project page https://sites.google.com/view/drive-any-robot.
翻译:学习为基于视觉的导航提供了强大工具,但基于学习的策略受限于有限训练数据。若能整合包括多种机器人类型在内的所有可用数据源,则可训练出更强大的导航模型。本文研究如何利用从多个结构相似但各异的机器人获取的数据,训练面向视觉导航的通用目标条件模型,并实现跨环境与具身形态的广泛泛化。我们分析了跨机器人有效数据共享所需的关键设计决策,包括使用时间上下文与标准化动作空间,并证明基于异构数据集训练的通用策略优于在单一数据集上训练的策略。我们整理了来自6种不同机器人的60小时导航轨迹数据,并将训练的GNM部署到包括欠驱动四旋翼在内的一系列新机器人上。研究发现,多样性数据训练可提升对感知与驱动性能下降的鲁棒性。具备广泛泛化能力的预训练导航模型能够为新型机器人的应用提供启动助力,我们期待GNM成为这一方向的重要一步。更多数据集、代码与视频信息,请参阅项目页面https://sites.google.com/view/drive-any-robot。