Robotic systems that assist humans should be capable of adapting their behaviors to individual user preferences. For instance, users may want a robot arm to adjust the amount of force it applies while folding their laundry or cleaning furniture. Natural language provides an intuitive way for humans to communicate such preferences. Recent progress in language-conditioned robot policies has shown that robots can successfully use language prompts to determine what task to perform. However, extending the same approach to realize how the task should be performed requires detailed labels describing the preferences or styles of trajectories in the task data. Not only is collecting such annotations challenging, but conditioning directly on these labels may also fail to provide fine-grained control over a continuous range of behaviors. For example, it can be difficult to convey the exact force that a robot must apply through abstract instructions like "apply a bit more pressure than before". Therefore, in this work, we propose using language to reason over preferred behaviors instead of directly generating them. We first learn a structured latent representation that organizes user preferences according to differences in the corresponding trajectories. Then, given a preference prompt, we use a foundation model to interpret this latent space and choose a value that produces the desired behavior. Through both simulation and real-world experiments, we show that selecting robot behaviors from an intuitively structured latent space enables more precise adaptation to user preferences while requiring significantly fewer preference labels than language-conditioned policies.
翻译:摘要:辅助人类的机器人系统应能根据个体用户偏好调整其行为。例如,用户可能希望机器臂在折叠衣物或清洁家具时调整施加的力。自然语言为人类传达此类偏好提供了直观方式。语言条件机器人策略的最新进展表明,机器人能成功利用语言提示确定待执行任务。然而,将相同方法扩展至实现任务执行方式,需要任务数据中描述轨迹偏好或风格的详细标注。此类标注的收集不仅具有挑战性,且直接以这些标签为条件可能无法对连续行为范围提供细粒度控制。例如,通过"比之前稍微多施加一点压力"这类抽象指令传递机器人需施加的准确力度存在困难。因此,本研究提出利用语言推理偏好行为而非直接生成行为。我们首先学习一种结构化潜在表征,根据相应轨迹差异对用户偏好进行组织。随后,给定偏好提示时,利用基础模型解释该潜在空间,并选择能产生所需行为的对应值。通过仿真与真实实验,我们证明:从直观结构化的潜在空间中选择机器人行为,能实现对用户偏好的更精准适应,同时所需偏好标注量显著少于语言条件策略。