The study of emergent communication has been dedicated to interactive artificial intelligence. While existing work focuses on communication about single objects or complex image scenes, we argue that communicating relationships between multiple objects is important in more realistic tasks, but understudied. In this paper, we try to fill this gap and focus on emergent communication about positional relationships between two objects. We train agents in the referential game where observations contain two objects, and find that generalization is the major problem when the positional relationship is involved. The key factor affecting the generalization ability of the emergent language is the input variation between Speaker and Listener, which is realized by a random image generator in our work. Further, we find that the learned language can generalize well in a new multi-step MDP task where the positional relationship describes the goal, and performs better than raw-pixel images as well as pre-trained image features, verifying the strong generalization ability of discrete sequences. We also show that language transfer from the referential game performs better in the new task than learning language directly in this task, implying the potential benefits of pre-training in referential games. All in all, our experiments demonstrate the viability and merit of having agents learn to communicate positional relationships between multiple objects through emergent communication.
翻译:涌现通信的研究一直致力于交互式人工智能。现有工作主要关注单一物体或复杂图像场景的通信,我们认为在多物体场景中学习物体间位置关系的通信对更真实的任务至关重要,但目前研究尚不充分。本文旨在填补这一空白,重点研究两个物体间位置关系的涌现通信。我们在参考游戏中训练智能体,其中观测包含两个物体,并发现当涉及位置关系时,泛化是主要问题。影响涌现语言泛化能力的关键因素是说话者和听者之间的输入差异,本文通过随机图像生成器实现这一点。进一步,我们发现所学语言能在以位置关系描述目标的新多步骤马尔可夫决策过程任务中良好泛化,且性能优于原始像素图像及预训练图像特征,验证了离散序列的强泛化能力。我们还表明,从参考游戏迁移语言到新任务的效果优于直接在该任务中学习语言,暗示了在参考游戏中进行预训练的潜在优势。总之,我们的实验证明了通过涌现通信让智能体学习多物体间位置关系通信的可行性与价值。