Robot decision-making increasingly relies on expressive data-driven human prediction models when operating around people. While these models are known to suffer from prediction errors in out-of-distribution interactions, not all prediction errors equally impact downstream robot performance. We identify that the mathematical notion of regret precisely characterizes the degree to which incorrect predictions of future interaction outcomes degraded closed-loop robot performance. However, canonical regret measures can be poorly calibrated across diverse deployment interactions. We derive a calibrated regret metric that evaluates the quality of robot decisions in probability space rather than reward space. With this transformation, our metric removes the need for explicit reward functions to calculate the robot's regret, enables fairer comparison of interaction anomalies across disparate deployment contexts, and facilitates targeted dataset construction of "system-level" prediction failures. We experimentally quantify the value of this high-regret interaction data for aiding the robot in improving its downstream decision-making. In a suite of closed-loop autonomous driving simulations, we find that fine-tuning ego-conditioned behavior predictors exclusively on high-regret human-robot interaction data can improve the robot's overall re-deployment performance with significantly (77%) less data.
翻译:机器人决策在与人共处时日益依赖具有表达能力的数据驱动型人类预测模型。尽管这些模型在分布外交互中已知存在预测误差,但并非所有预测误差都会对下游机器人性能产生同等影响。我们识别出遗憾这一数学概念能够精确刻画对未来交互结果的错误预测在多大程度上降低了闭环机器人的性能。然而,标准遗憾度量在不同部署场景的交互中可能校准不佳。我们推导出一种校准遗憾度量,该度量在概率空间而非奖励空间中评估机器人决策的质量。通过这一变换,我们的度量消除了计算机器人遗憾时对显式奖励函数的需求,能够更公平地比较不同部署背景下的交互异常,并促进针对"系统级"预测失败的定向数据集构建。我们通过实验量化了这种高遗憾交互数据在帮助机器人改善下游决策中的价值。在一组闭环自动驾驶仿真实验中,我们发现仅在高质量遗憾的人机交互数据上微调自我条件行为预测器,即可使机器人整体重新部署性能提升,且所需数据量显著减少(77%)。