Recently, there has been a great deal of research in emergent communication on artificial agents interacting in simulated environments. Recent studies have revealed that, in general, emergent languages do not follow the compositionality patterns of natural language. To deal with this, existing works have proposed a limited channel capacity as an important constraint for learning highly compositional languages. In this paper, we show that this is not a sufficient condition and propose an intrinsic reward framework for improving compositionality in emergent communication. We use a reinforcement learning setting with two agents -- a \textit{task-aware} Speaker and a \textit{state-aware} Listener that are required to communicate to perform a set of tasks. Through our experiments on three different referential game setups, including a novel environment gComm, we show intrinsic rewards improve compositionality scores by $\approx \mathbf{1.5-2}$ times that of existing frameworks that use limited channel capacity.
翻译:近期,在模拟环境中交互的人工智能体间的涌现通信研究取得了大量进展。最新研究表明,一般来说涌现语言并不遵循自然语言的组合模式。为解决这一问题,现有工作提出将有限信道容量作为学习高度组合语言的重要约束条件。本文论证了该条件并非充分条件,并提出了一种改进涌现通信中组合性的内在奖励框架。我们采用强化学习设置,包含两个智能体——一个任务感知型说话者与一个状态感知型听众,它们需通过通信协作完成一系列任务。通过在三个不同指涉游戏设置(包括新型环境gComm)中的实验,我们证明内在奖励框架可使组合性得分提升至现有有限信道容量框架的约1.5-2倍。