Learning reward functions from demonstrations assumes that demonstrations provide adequate supervision over all features -- or task-relevant aspects of behavior. In practice, demonstrations are often imperfect: humans may under-emphasize certain features due to cognitive load or physical difficulty, or the training regime may fail to sufficiently cover all relevant situations. In either case, important features may be underspecified, leading to ambiguity in the learned reward function and misaligned behavior at deployment. We propose a framework that detects such underspecified features and actively solicits targeted corrective demonstrations. Our key insight is that demonstrations implicitly reveal which features are well specified: features that are consistently optimized show little variation across demonstrations, while features that are underspecified vary widely. We leverage this statistical signal to infer which features may have been insufficiently demonstrated. The robot then explains which features it is uncertain about in natural language and queries for demonstrations that explicitly address the identified gaps. We evaluate our approach in a simulated tabletop manipulation domain and in a user study with a real Franka robot. Targeted, explanation-guided queries significantly improve reward recovery compared to random querying and passive data collection, reducing ambiguity that would otherwise persist in learning from imperfect demonstrations.
翻译:从演示中学习奖励函数的前提是演示能够对所有特征(即行为的任务相关方面)提供充分的监督。然而在实践中,演示往往是不完美的:由于认知负荷或物理难度,人类可能对某些特征强调不足,或者训练方案可能无法充分覆盖所有相关情境。在这两种情况下,重要的特征可能未被充分指定,导致学习到的奖励函数存在歧义,并在部署时出现行为偏差。我们提出一个框架,用于检测此类未充分指定的特征,并主动寻求针对性的纠正演示。我们的关键洞察是:演示隐式地揭示了哪些特征被良好指定——持续优化的特征在演示间变化很小,而未充分指定的特征则变化较大。我们利用这一统计信号来推断哪些特征可能未被充分演示。随后,机器人以自然语言解释其不确定的特征,并请求专门针对已识别差距的演示。我们在模拟桌面操作域和真实Franka机器人的用户研究中评估了该方法。与随机查询和被动数据收集相比,针对性的、由解释引导的查询显著改善了奖励恢复,减少了原本会因不完美演示而持续存在的歧义。