Communication via natural language is a crucial aspect of intelligence, and it requires computational models to learn and reason about world concepts, with varying levels of supervision. While there has been significant progress made on fully-supervised non-interactive tasks, such as question-answering and procedural text understanding, much of the community has turned to various sequential interactive tasks, as in semi-Markov text-based games, which have revealed limitations of existing approaches in terms of coherence, contextual awareness, and their ability to learn effectively from the environment. In this paper, we propose a framework for enabling improved functional grounding of agents in text-based games. Specifically, we consider two forms of domain knowledge that we inject into learning-based agents: memory of previous correct actions and affordances of relevant objects in the environment. Our framework supports three representative model classes: `pure' reinforcement learning (RL) agents, RL agents enhanced with knowledge graphs, and agents equipped with language models. Furthermore, we devise multiple injection strategies for the above domain knowledge types and agent architectures, including injection via knowledge graphs and augmentation of the existing input encoding strategies. We perform all experiments on the ScienceWorld text-based game environment, to illustrate the performance of various model configurations in challenging science-related instruction-following tasks. Our findings provide crucial insights on the development of effective natural language processing systems for interactive contexts.
翻译:通过自然语言进行交流是智能的关键方面,这要求计算模型在不同程度的监督下学习并推理世界概念。尽管在完全监督的非交互式任务(如问答和程序化文本理解)方面取得了显著进展,但学术界已转向半马尔可夫文本游戏等多种序列化交互任务,这些任务揭示了现有方法在连贯性、上下文感知以及从环境中有效学习能力方面的局限性。本文提出一个框架,以增强智能体在文本游戏中的功能基础化能力。具体而言,我们考虑两种注入到基于学习的智能体中的领域知识:先前正确动作的记忆以及环境中相关对象的可供性。该框架支持三种代表性模型类别:纯强化学习智能体、知识图谱增强的强化学习智能体以及配备语言模型的智能体。此外,我们针对上述领域知识类型和智能体架构设计了多种注入策略,包括通过知识图谱注入以及扩展现有输入编码策略。所有实验均在ScienceWorld文本游戏环境中进行,以展示不同模型配置在具有挑战性的科学相关指令遵循任务中的表现。我们的研究为开发交互式环境下有效的自然语言处理系统提供了关键见解。