Programming robots to perform complex tasks is often difficult and time consuming, requiring expert knowledge and skills in robot software and sometimes hardware. Imitation learning is a method for training robots to perform tasks by leveraging human expertise through demonstrations. Typically, the assumption is that those demonstrations are performed by a single, highly competent expert. However, in many real-world applications that use user demonstrations for tasks or incorporate both user data and pretrained data, such as home robotics including assistive robots, this is unlikely to be the case. This paper presents research towards a system which can leverage suboptimal demonstrations to solve ambiguous tasks; and particularly learn from its own failures. This is a negative-feedback system which achieves significant improvement over purely positive imitation learning for ambiguous tasks, achieving a 90% improvement in success rate against a system that does not utilise negative feedback, compared to a 50% improvement in success rate when utilised on a real robot, as well as demonstrating higher efficacy, memory efficiency and time efficiency than a comparable negative feedback scheme. The novel scheme presented in this paper is validated through simulated and real-robot experiments.
翻译:编程机器人执行复杂任务通常困难且耗时,需要具备机器人软件(有时也包括硬件)方面的专业知识和技能。模仿学习通过利用人类专家的示范来训练机器人执行任务。通常,这些示范任务假定由单一且高度胜任的专家完成。然而,在许多实际应用场景中,例如包含辅助机器人的家庭机器人领域,当使用用户示范执行任务或融合用户数据与预训练数据时,这一假设往往难以成立。本文研究了一种能够利用次优示范解决歧义性任务,并特别从自身失败中学习的系统。该系统采用负反馈机制,在歧义性任务上相比纯正向模仿学习取得了显著提升——在未使用负反馈的系统中,成功率提升达90%;而在真实机器人上应用时,成功率提升为50%。此外,该系统在效能、存储效率和时间效率方面均优于同类负反馈方案。本文提出的新型方案通过仿真实验和真实机器人实验得到了验证。