PECAN: Personalizing Robot Behaviors through a Learned Canonical Space

Robots should personalize how they perform tasks to match the needs of individual human users. Today's robot achieve this personalization by asking for the human's feedback in the task space. For example, an autonomous car might show the human two different ways to decelerate at stoplights, and ask the human which of these motions they prefer. This current approach to personalization is indirect: based on the behaviors the human selects (e.g., decelerating slowly), the robot tries to infer their underlying preference (e.g., defensive driving). By contrast, our paper develops a learning and interface-based approach that enables humans to directly indicate their desired style. We do this by learning an abstract, low-dimensional, and continuous canonical space from human demonstration data. Each point in the canonical space corresponds to a different style (e.g., defensive or aggressive driving), and users can directly personalize the robot's behavior by simply clicking on a point. Given the human's selection, the robot then decodes this canonical style across each task in the dataset -- e.g., if the human selects a defensive style, the autonomous car personalizes its behavior to drive defensively when decelerating, passing other cars, or merging onto highways. We refer to our resulting approach as PECAN: Personalizing Robot Behaviors through a Learned Canonical Space. Our simulations and user studies suggest that humans prefer using PECAN to directly personalize robot behavior (particularly when those users become familiar with PECAN), and that users find the learned canonical space to be intuitive and consistent. See videos here: https://youtu.be/wRJpyr23PKI

翻译：机器人应个性化执行任务的方式，以满足不同人类用户的需求。当前的机器人通过在任务空间中征求人类反馈来实现这种个性化。例如，自动驾驶汽车可能向人类展示两种不同的在红绿灯处减速方式，并询问人类更偏好哪种运动模式。这种现有的个性化方法是间接的：基于人类选择的行为（例如缓慢减速），机器人尝试推断其潜在偏好（例如防御性驾驶）。相比之下，本文提出了一种基于学习与交互界面的方法，使人类能够直接指定其期望的风格。我们通过从人类演示数据中学习一个抽象、低维且连续的规范空间来实现这一点。规范空间中的每个点对应不同的风格（例如防御性或攻击性驾驶），用户只需点击某个点即可直接个性化机器人的行为。根据人类的选择，机器人随后在数据集中的每个任务上解码这种规范风格——例如，若人类选择防御性风格，自动驾驶汽车会在减速、超车或汇入高速公路时个性化调整其行为以进行防御性驾驶。我们将这一方法命名为PECAN：通过学习规范空间实现机器人行为个性化。模拟实验和用户研究表明，人类更倾向于使用PECAN直接个性化机器人行为（尤其是当用户熟悉PECAN后），并且用户认为学习得到的规范空间直观且具有一致性。视频演示请参见：https://youtu.be/wRJpyr23PKI