When humans conceive how to perform a particular task, they do so hierarchically: splitting higher-level tasks into smaller sub-tasks. However, in the literature on natural language (NL) command of situated agents, most works have treated the procedures to be executed as flat sequences of simple actions, or any hierarchies of procedures have been shallow at best. In this paper, we propose a formalism of procedures as programs, a powerful yet intuitive method of representing hierarchical procedural knowledge for agent command and control. We further propose a modeling paradigm of hierarchical modular networks, which consist of a planner and reactors that convert NL intents to predictions of executable programs and probe the environment for information necessary to complete the program execution. We instantiate this framework on the IQA and ALFRED datasets for NL instruction following. Our model outperforms reactive baselines by a large margin on both datasets. We also demonstrate that our framework is more data-efficient, and that it allows for fast iterative development.
翻译:当人类构思如何执行特定任务时,他们采用分层方式:将高层任务分解为更小的子任务。然而,在自然语言指挥情境智能体的相关文献中,大多数研究将待执行过程视为简单动作的扁平序列,即使存在过程分层也往往较为浅层。本文提出"过程即程序"的形式化方法,这是一种强大且直观的用于表示智能体指挥与控制中分层过程性知识的表征方法。我们进一步提出分层模块化网络的建模范式,该网络由规划器和反应器组成,可将自然语言意图转化为可执行程序的预测,并探测环境以获取完成程序执行所需的信息。我们在IQA和ALFRED数据集上基于自然语言指令跟随任务实例化该框架。实验表明,我们的模型在两个数据集上均大幅优于反应式基线模型。同时证明该框架具有更高数据效率,并支持快速迭代开发。