In human-robot collaboration, the objectives of the human are often unknown to the robot. Moreover, even assuming a known objective, the human behavior is also uncertain. In order to plan a robust robot behavior, a key preliminary question is then: How to derive realistic human behaviors given a known objective? A major issue is that such a human behavior should itself account for the robot behavior, otherwise collaboration cannot happen. In this paper, we rely on Markov decision models, representing the uncertainty over the human objective as a probability distribution over a finite set of objective functions (inducing a distribution over human behaviors). Based on this, we propose two contributions: 1) an approach to automatically generate an uncertain human behavior (a policy) for each given objective function while accounting for possible robot behaviors; and 2) a robot planning algorithm that is robust to the above-mentioned uncertainties and relies on solving a partially observable Markov decision process (POMDP) obtained by reasoning on a distribution over human behaviors. A co-working scenario allows conducting experiments and presenting qualitative and quantitative results to evaluate our approach.
翻译:在人机协作中,人类的目标往往不为机器人所知。此外,即使假设目标已知,人类行为也具有不确定性。为了规划鲁棒的机器人行为,一个关键的基础问题是:在已知目标的情况下,如何推导出真实的人类行为?一个主要挑战在于,这种人类行为本身必须考虑到机器人的行为,否则协作便无法实现。本文基于马尔可夫决策模型,将人类目标的不确定性表示为有限目标函数集合上的概率分布(从而生成人类行为的分布)。在此基础上,我们提出两项贡献:1)一种方法,能够针对每个给定目标函数自动生成不确定的人类行为(策略),同时考虑可能的机器人行为;2)一种机器人规划算法,该算法对上述不确定性具有鲁棒性,并通过求解基于人类行为分布推理得到的部分可观测马尔可夫决策过程(POMDP)来实现。通过协作场景实验,我们展示了定性与定量结果以评估所提方法。