Generalization remains one of the most important desiderata for robust robot learning systems. While recently proposed approaches show promise in generalization to novel objects, semantic concepts, or visual distribution shifts, generalization to new tasks remains challenging. For example, a language-conditioned policy trained on pick-and-place tasks will not be able to generalize to a folding task, even if the arm trajectory of folding is similar to pick-and-place. Our key insight is that this kind of generalization becomes feasible if we represent the task through rough trajectory sketches. We propose a policy conditioning method using such rough trajectory sketches, which we call RT-Trajectory, that is practical, easy to specify, and allows the policy to effectively perform new tasks that would otherwise be challenging to perform. We find that trajectory sketches strike a balance between being detailed enough to express low-level motion-centric guidance while being coarse enough to allow the learned policy to interpret the trajectory sketch in the context of situational visual observations. In addition, we show how trajectory sketches can provide a useful interface to communicate with robotic policies: they can be specified through simple human inputs like drawings or videos, or through automated methods such as modern image-generating or waypoint-generating methods. We evaluate RT-Trajectory at scale on a variety of real-world robotic tasks, and find that RT-Trajectory is able to perform a wider range of tasks compared to language-conditioned and goal-conditioned policies, when provided the same training data.
翻译:泛化仍是鲁棒机器人学习系统最重要的目标之一。尽管近期提出的方法在新物体、语义概念或视觉分布偏移的泛化方面展现出潜力,但针对新任务的泛化仍具挑战性。例如,基于语言条件训练于拾取放置任务的策略,即使折叠任务的机械臂轨迹与拾取放置相似,也无法泛化至折叠任务。我们的关键洞察在于:若通过粗糙轨迹草图表征任务,此类泛化将变得可行。我们提出一种基于此类轨迹草图的条件策略方法——RT-Trajectory,该方法兼具实用性与易定义性,并能促使策略有效执行其他方法难以完成的新任务。研究发现,轨迹草图在细节与抽象性之间取得了平衡:既足够详细以表达底层运动导向指引,又足够粗略以允许学习策略在情境化视觉观测中解读轨迹草图。此外,我们展示了轨迹草图如何为与机器人策略通信提供实用接口:可通过简单人类输入(如绘图或视频)或自动化方法(如现代图像生成或航点生成方法)进行定义。我们在多种真实机器人任务上大规模评估RT-Trajectory,发现相较于语言条件策略与目标条件策略,在提供相同训练数据的情况下,RT-Trajectory能执行更广泛的任务。