Recent advances in robotic mobile manipulation have spurred the expansion of the operating environment for robots from constrained workspaces to large-scale, human environments. In order to effectively complete tasks in these spaces, robots must be able to perceive, reason, and execute over a diversity of affordances, well beyond simple pick-and-place. We posit the notion of semantic frames provides a compelling representation for robot actions that is amenable to action-focused perception, task-level reasoning, action-level execution, and integration with language. Semantic frames, a product of the linguistics community, define the necessary elements, pre- and post- conditions, and a set of sequential robot actions necessary to successfully execute an action evoked by a verb phrase. In this work, we extend the semantic frame representation for robot manipulation actions and introduce the problem of Semantic Frame Execution And Localization for Perceiving Afforded Robot Actions (SEAL) as a graphical model. For the SEAL problem, we describe our nonparametric Semantic Frame Mapping (SeFM) algorithm for maintaining belief over a finite set of semantic frames as the locations of actions afforded to the robot. We show that language models such as GPT-3 are insufficient to address generalized task execution covered by the SEAL formulation and SeFM provides robots with efficient search strategies and long term memory needed when operating in building-scale environments.
翻译:近年来,机器人移动操作领域的进展促使机器人的操作环境从受限工作空间扩展到大规模人类环境。为了在这些空间中有效完成任务,机器人必须能够感知、推理并执行多种可供性(affordances),而远非简单的拾取与放置。我们认为,语义框架(semantic frames)的概念为机器人动作提供了一种有吸引力的表示,这种表示适用于以动作为中心的感知、任务级推理、动作级执行以及与语言的集成。语义框架源自语言学领域,定义了一组必要元素、前置与后置条件,以及一系列顺序的机器人动作,这些动作对于成功执行由动词短语引发的动作是必需的。在本工作中,我们将语义框架表示扩展到机器人操作动作,并引入“感知可供机器人动作的语义框架执行与定位”(SEAL)问题,将其建模为一个图模型。针对SEAL问题,我们描述了非参数化的语义框架映射(SeFM)算法,用于维护对有限语义框架集合的信念,这些框架对应于机器人可执行动作的位置。我们表明,像GPT-3这样的语言模型不足以应对SEAL公式所涵盖的广义任务执行,而SeFM为机器人在建筑规模环境中的操作提供了高效的搜索策略和长期记忆所需的能力。