Imitation learning frameworks that learn robot control policies from demonstrators' motions via hand-mounted demonstration interfaces have attracted increasing attention. However, due to differences in physical characteristics between demonstrators and robots, this approach faces two limitations: i) the demonstration data do not include robot actions, and ii) the demonstrated motions may be infeasible for robots. These limitations make policy learning difficult. To address them, we propose Feasibility-Aware Behavior Cloning from Observation (FABCO). FABCO integrates behavior cloning from observation, which complements robot actions using robot dynamics models, with feasibility estimation. In feasibility estimation, the demonstrated motions are evaluated using a robot-dynamics model, learned from the robot's execution data, to assess reproducibility under the robot's dynamics. The estimated feasibility is used for multimodal feedback and feasibility-aware policy learning to improve the demonstrator's motions and learn robust policies. Multimodal feedback provides feasibility through the demonstrator's visual and haptic senses to promote feasible demonstrated motions. Feasibility-aware policy learning reduces the influence of demonstrated motions that are infeasible for robots, enabling the learning of policies that robots can execute stably. We conducted experiments with 15 participants on two tasks and confirmed that FABCO improves imitation learning performance by more than 3.2 times compared to the case without feasibility feedback.
翻译:通过手持式演示接口从演示者动作中学习机器人控制策略的模仿学习框架已引起越来越多的关注。然而,由于演示者与机器人之间物理特性的差异,该方法面临两个局限性:i) 演示数据不包含机器人动作,ii) 演示动作对机器人可能不可行。这些局限性使得策略学习变得困难。为解决这些问题,我们提出了可行性感知的观察行为克隆(FABCO)。FABCO将观察行为克隆(通过机器人动力学模型补充机器人动作)与可行性估计相结合。在可行性估计中,演示动作通过从机器人执行数据中学习的机器人动力学模型进行评估,以判断其在机器人动力学下的可复现性。估计的可行性被用于多模态反馈和可行性感知策略学习,以改进演示者动作并学习鲁棒策略。多模态反馈通过演示者的视觉和触觉感知提供可行性信息,以促进生成可行的演示动作。可行性感知策略学习降低了机器人不可行演示动作的影响,从而能够学习机器人可稳定执行的策略。我们在两项任务中进行了15名参与者的实验,证实与无可行性反馈的情况相比,FABCO将模仿学习性能提升了3.2倍以上。