Human interaction is essential for issuing personalized instructions and assisting robots when failure is likely. However, robots remain largely black boxes, offering users little insight into their evolving capabilities and limitations. To address this gap, we present explainable object-oriented HRI (X-OOHRI), an augmented reality (AR) interface that conveys robot action possibilities and constraints through visual signifiers, radial menus, color coding, and explanation tags. Our system encodes object properties and robot limits into object-oriented structures using a vision-language model, allowing explanation generation on the fly and direct manipulation of virtual twins spatially aligned within a simulated environment. We integrate the end-to-end pipeline with a physical robot and showcase diverse use cases ranging from low-level pick-and-place to high-level instructions. Finally, we evaluate X-OOHRI through a user study and find that participants effectively issue object-oriented commands, develop accurate mental models of robot limitations, and engage in mixed-initiative resolution.
翻译:人类交互对于发布个性化指令以及在可能发生故障时协助机器人至关重要。然而,机器人很大程度上仍是黑箱系统,难以为用户提供关于其动态能力与局限的深入洞察。为弥补这一不足,我们提出可解释的面向对象人机交互(X-OOHRI),这是一种增强现实(AR)界面,通过视觉指示符、径向菜单、颜色编码和解释标签来传达机器人动作的可能性与约束。我们的系统利用视觉语言模型将物体属性与机器人限制编码为面向对象的结构,从而支持实时生成解释,并允许在模拟环境中对空间对齐的虚拟孪生体进行直接操控。我们将端到端流水线与物理机器人集成,并展示了从底层抓取放置到高层指令的多样化用例。最后,我们通过用户研究评估X-OOHRI,发现参与者能够有效发布面向对象的指令,建立关于机器人局限的准确心智模型,并参与混合主动式解决方案。