Consistency Matters: Defining Demonstration Data Quality Metrics in Robot Learning from Demonstration

Learning from Demonstration (LfD) empowers robots to acquire new skills through human demonstrations, making it feasible for everyday users to teach robots. However, the success of learning and generalization heavily depends on the quality of these demonstrations. Consistency is often used to indicate quality in LfD, yet the factors that define this consistency remain underexplored. In this paper, we evaluate a comprehensive set of motion data characteristics to determine which consistency measures best predict learning performance. By ensuring demonstration consistency prior to training, we enhance models' predictive accuracy and generalization to novel scenarios. We validate our approach with two user studies involving participants with diverse levels of robotics expertise. In the first study (N = 24), users taught a PR2 robot to perform a button-pressing task in a constrained environment, while in the second study (N = 30), participants trained a UR5 robot on a pick-and-place task. Results show that demonstration consistency significantly impacts success rates in both learning and generalization, with 70% and 89% of task success rates in the two studies predicted using our consistency metrics. Moreover, our metrics estimate generalized performance success rates with 76% and 91% accuracy. These findings suggest that our proposed measures provide an intuitive, practical way to assess demonstration data quality before training, without requiring expert data or algorithm-specific modifications. Our approach offers a systematic way to evaluate demonstration quality, addressing a critical gap in LfD by formalizing consistency metrics that enhance the reliability of robot learning from human demonstrations.

翻译：演示学习（LfD）使机器人能够通过人类演示获取新技能，让普通用户也能教授机器人成为可能。然而，学习与泛化的成功在很大程度上取决于这些演示的质量。一致性常被用作LfD中质量的指标，但定义这种一致性的因素仍未得到充分探索。本文评估了一套全面的运动数据特征，以确定哪些一致性度量最能预测学习性能。通过在训练前确保演示的一致性，我们提高了模型对新场景的预测准确性和泛化能力。我们通过两项涉及不同机器人专业水平参与者的用户研究验证了我们的方法。在第一项研究（N = 24）中，用户在受限环境中教导PR2机器人执行按钮按压任务；而在第二项研究（N = 30）中，参与者训练UR5机器人完成拾取放置任务。结果表明，演示一致性显著影响学习和泛化的成功率，使用我们的一致性度量预测的两项研究任务成功率分别为70%和89%。此外，我们的度量以76%和91%的准确率估计了泛化性能的成功率。这些发现表明，我们提出的度量提供了一种直观、实用的方法，可在训练前评估演示数据质量，无需专家数据或特定算法修改。我们的方法提供了一种系统评估演示质量的途径，通过形式化一致性度量来增强机器人从人类演示中学习的可靠性，从而弥补了LfD领域的一个关键空白。