In this paper we investigate the notion of legibility in sequential decision tasks under uncertainty. Previous works that extend legibility to scenarios beyond robot motion either focus on deterministic settings or are computationally too expensive. Our proposed approach, dubbed PoL-MDP, is able to handle uncertainty while remaining computationally tractable. We establish the advantages of our approach against state-of-the-art approaches in several simulated scenarios of different complexity. We also showcase the use of our legible policies as demonstrations for an inverse reinforcement learning agent, establishing their superiority against the commonly used demonstrations based on the optimal policy. Finally, we assess the legibility of our computed policies through a user study where people are asked to infer the goal of a mobile robot following a legible policy by observing its actions.
翻译:在本文中,我们研究了不确定性条件下序贯决策任务中的可读性概念。先前将可读性扩展到机器人运动之外场景的工作要么侧重于确定性设置,要么在计算上过于昂贵。我们提出的方法,称为PoL-MDP,能够在保持计算可处理性的同时处理不确定性。我们在多个不同复杂度的模拟场景中证明了该方法相对于现有先进方法的优势。我们还展示了将我们的可读策略作为逆强化学习智能体的演示,证明了其相对于基于最优策略的常用演示的优越性。最后,我们通过一项用户研究评估了我们计算出的策略的可读性,在该研究中,要求参与者通过观察移动机器人的动作来推断其遵循可读策略时的目标。