Safety is critical during human-robot interaction. But -- because people are inherently unpredictable -- it is often difficult for robots to plan safe behaviors. Instead of relying on our ability to anticipate humans, here we identify robot policies that are robust to unexpected human decisions. We achieve this by formulating human-robot interaction as a zero-sum game, where (in the worst case) the human's actions directly conflict with the robot's objective. Solving for the Nash Equilibrium of this game provides robot policies that maximize safety and performance across a wide range of human actions. Existing approaches attempt to find these optimal policies by leveraging Hamilton-Jacobi analysis (which is intractable) or linear-quadratic approximations (which are inexact). By contrast, in this work we propose a computationally efficient and theoretically justified method that converges towards the Nash Equilibrium policy. Our approach (which we call MCLQ) leverages linear-quadratic games to obtain an initial guess at safe robot behavior, and then iteratively refines that guess with a Monte Carlo search. Not only does MCLQ provide real-time safety adjustments, but it also enables the designer to tune how conservative the robot is -- preventing the system from focusing on unrealistic human behaviors. Our simulations and user study suggest that this approach advances safety in terms of both computation time and expected performance. See videos of our experiments here: https://youtu.be/KJuHeiWVuWY.
翻译:在人机交互过程中,安全性至关重要。然而,由于人类行为本质上具有不可预测性,机器人往往难以规划出安全的行为。本文不依赖于对人类行为的预判能力,而是寻求对意外人类决策具有鲁棒性的机器人策略。我们将人机交互建模为零和博弈,其中(在最坏情况下)人类行为与机器人目标直接冲突。求解该博弈的纳什均衡可得到在广泛人类行为范围内最大化安全性与性能的机器人策略。现有方法试图通过哈密顿-雅可比分析(计算不可行)或线性二次逼近(精度不足)来寻找这些最优策略。相比之下,本研究提出一种计算高效且理论完备的方法,能够收敛至纳什均衡策略。我们的方法(称为MCLQ)利用线性二次博弈获得安全机器人行为的初始估计,继而通过蒙特卡洛搜索迭代优化该估计。MCLQ不仅能提供实时安全调整,还允许设计者调节机器人的保守程度——避免系统过度关注不现实的人类行为。仿真与用户研究表明,该方法在计算时间与预期性能方面均提升了安全性。实验视频参见:https://youtu.be/KJuHeiWVuWY。