Quantifying Assistive Robustness Via the Natural-Adversarial Frontier

Our ultimate goal is to build robust policies for robots that assist people. What makes this hard is that people can behave unexpectedly at test time, potentially interacting with the robot outside its training distribution and leading to failures. Even just measuring robustness is a challenge. Adversarial perturbations are the default, but they can paint the wrong picture: they can correspond to human motions that are unlikely to occur during natural interactions with people. A robot policy might fail under small adversarial perturbations but work under large natural perturbations. We propose that capturing robustness in these interactive settings requires constructing and analyzing the entire natural-adversarial frontier: the Pareto-frontier of human policies that are the best trade-offs between naturalness and low robot performance. We introduce RIGID, a method for constructing this frontier by training adversarial human policies that trade off between minimizing robot reward and acting human-like (as measured by a discriminator). On an Assistive Gym task, we use RIGID to analyze the performance of standard collaborative Reinforcement Learning, as well as the performance of existing methods meant to increase robustness. We also compare the frontier RIGID identifies with the failures identified in expert adversarial interaction, and with naturally-occurring failures during user interaction. Overall, we find evidence that RIGID can provide a meaningful measure of robustness predictive of deployment performance, and uncover failure cases in human-robot interaction that are difficult to find manually. https://ood-human.github.io.

翻译：我们的最终目标是为辅助人类的机器人构建鲁棒策略。这一目标的难点在于，人类在测试时可能表现出不可预测的行为，从而在与机器人的交互中超出其训练分布范围，导致系统失效。甚至连鲁棒性的度量本身都是一个挑战。对抗性扰动是默认的度量方法，但它们可能描绘出错误的图景：这些扰动可能对应在自然交互中不太可能发生的人类动作。机器人策略可能在小的对抗性扰动下失效，但在大的自然扰动下却能正常工作。我们提出，在交互式设置中捕捉鲁棒性需要构建并分析完整的自然-对抗前沿：即人类策略的帕累托前沿，这些策略在自然性和低机器人性能之间达到最佳权衡。我们引入RIGID方法，通过训练在最小化机器人奖励与表现类人行为（由判别器衡量）之间权衡的对抗性人类策略来构建这一前沿。在辅助健身任务中，我们使用RIGID分析了标准协作强化学习的性能，以及旨在提升鲁棒性的现有方法的性能。我们还将RIGID识别出的前沿与专家对抗交互中识别的失效模式，以及用户交互中自然发生的失效模式进行了比较。总体而言，我们发现证据表明RIGID能够提供有意义的、可预测部署性能的鲁棒性度量，并揭示人机交互中难以手动发现的失效案例。https://ood-human.github.io