Generalization remains one of the most important desiderata for robust robot learning systems. While recently proposed approaches show promise in generalization to novel objects, semantic concepts, or visual distribution shifts, generalization to new tasks remains challenging. For example, a language-conditioned policy trained on pick-and-place tasks will not be able to generalize to a folding task, even if the arm trajectory of folding is similar to pick-and-place. Our key insight is that this kind of generalization becomes feasible if we represent the task through rough trajectory sketches. We propose a policy conditioning method using such rough trajectory sketches, which we call RT-Trajectory, that is practical, easy to specify, and allows the policy to effectively perform new tasks that would otherwise be challenging to perform. We find that trajectory sketches strike a balance between being detailed enough to express low-level motion-centric guidance while being coarse enough to allow the learned policy to interpret the trajectory sketch in the context of situational visual observations. In addition, we show how trajectory sketches can provide a useful interface to communicate with robotic policies: they can be specified through simple human inputs like drawings or videos, or through automated methods such as modern image-generating or waypoint-generating methods. We evaluate RT-Trajectory at scale on a variety of real-world robotic tasks, and find that RT-Trajectory is able to perform a wider range of tasks compared to language-conditioned and goal-conditioned policies, when provided the same training data.
翻译:泛化仍然是鲁棒机器人学习系统最重要的目标之一。尽管近期提出的方法在应对新物体、语义概念或视觉分布偏移方面展现出潜力,但针对新任务的泛化仍具挑战性。例如,一个基于语言条件、经过抓取-放置任务训练的模型无法泛化到折叠任务——即便折叠的机械臂轨迹与抓取-放置任务高度相似。我们的关键洞见是:若通过粗略的轨迹草图来表征任务,此类泛化将成为可能。为此,我们提出一种基于这类粗略轨迹草图的条件策略方法,称为RT-Trajectory。该方法兼具实用性与易指定性,使策略能够有效执行那些原本难以完成的新任务。我们发现,轨迹草图在精细度上达到了精妙的平衡:其既能表达足够详细的低级运动导向指引,又保持足够的粗糙度,使学习到的策略能够结合情境化视觉观察进行解读。此外,我们展示了轨迹草图如何成为与机器人策略交互的有效接口:它们既可通过简单的人类输入(如手绘或视频)指定,也可通过自动化方法(如现代图像生成或航点生成技术)生成。我们在多种真实机器人任务上对RT-Trajectory进行大规模评估,结果表明:在相同训练数据下,相较于语言条件策略和目标条件策略,RT-Trajectory能够执行更广泛的任务。