For robots to be useful outside labs and specialized factories we need a way to teach them new useful behaviors quickly. Current approaches lack either the generality to onboard new tasks without task-specific engineering, or else lack the data-efficiency to do so in an amount of time that enables practical use. In this work we explore dense tracking as a representational vehicle to allow faster and more general learning from demonstration. Our approach utilizes Track-Any-Point (TAP) models to isolate the relevant motion in a demonstration, and parameterize a low-level controller to reproduce this motion across changes in the scene configuration. We show this results in robust robot policies that can solve complex object-arrangement tasks such as shape-matching, stacking, and even full path-following tasks such as applying glue and sticking objects together, all from demonstrations that can be collected in minutes.
翻译:为使机器人能在实验室和专业化工厂之外发挥实际作用,我们需要一种能快速教会它们新有用行为的方法。当前方法要么缺乏通用性,无法在不依赖任务特定工程的情况下引入新任务,要么数据效率低下,无法在实用时间范围内完成学习。本文探索将密集追踪作为表征媒介,以实现更快速、更通用的示教学习。我们的方法利用任意点追踪(TAP)模型来提取示教中的关键运动,并通过参数化底层控制器,在场景配置变化的情况下复现该运动。实验证明,该方法能生成鲁棒的机器人策略,解决形状匹配、堆叠等复杂物体排列任务,甚至包括涂胶、粘合物体等完整路径跟随任务——而所有这些任务仅需数分钟收集的示教数据即可实现。