Humans often demonstrate diverse behaviors due to their personal preferences, for instance, related to their individual execution style or personal margin for safety. In this paper, we consider the problem of integrating both path and velocity preferences into trajectory planning for robotic manipulators. We first learn reward functions that represent the user path and velocity preferences from kinesthetic demonstration. We then optimize the trajectory in two steps: first the path and then the velocity, to produce trajectories that adhere to both task requirements and user preferences. We design a set of parameterized features that capture the fundamental preferences in a pick-and-place type of object-transportation task, both in shape and timing of the motion. We demonstrate that our method is capable of generalizing such preferences to new scenarios. We implement our algorithm on a Franka Emika 7-DoF robot arm, and validate the functionality and flexibility of our approach in a user study. The results show that non-expert users are able to teach the robot their preferences with just a few iterations of feedback.
翻译:摘要:人类常常因个人偏好展现出多样化的行为,例如与个体执行风格或安全边际相关的偏好。本文针对机器人操作臂轨迹规划中融合路径与速度偏好的问题展开研究。我们首先通过动觉演示学习表征用户路径与速度偏好的奖励函数,随后分两步优化轨迹:先优化路径,再优化速度,从而生成既满足任务要求又符合用户偏好的轨迹。我们设计了一组参数化特征,该特征能捕捉搬运类物体运输任务中形状与运动时序的基本偏好。实验表明,本方法能够将此类偏好泛化至新场景。我们在Franka Emika 7自由度机械臂上实现了该算法,并通过用户研究验证了方法的实用性与灵活性。结果显示,非专业用户仅需通过数轮反馈即可教会机器人自身偏好。