Explicit communication among humans is key to coordinating and learning. Social learning, which uses cues from experts, can greatly benefit from the usage of explicit communication to align heterogeneous policies, reduce sample complexity, and solve partially observable tasks. Emergent communication, a type of explicit communication, studies the creation of an artificial language to encode a high task-utility message directly from data. However, in most cases, emergent communication sends insufficiently compressed messages with little or null information, which also may not be understandable to a third-party listener. This paper proposes an unsupervised method based on the information bottleneck to capture both referential complexity and task-specific utility to adequately explore sparse social communication scenarios in multi-agent reinforcement learning (MARL). We show that our model is able to i) develop a natural-language-inspired lexicon of messages that is independently composed of a set of emergent concepts, which span the observations and intents with minimal bits, ii) develop communication to align the action policies of heterogeneous agents with dissimilar feature models, and iii) learn a communication policy from watching an expert's action policy, which we term `social shadowing'.
翻译:人类之间的显式通信是协调与学习的关键。社会学习通过利用来自专家的线索,能够显著受益于显式通信的使用,以对齐异构策略、降低样本复杂度并解决部分可观测任务。涌现通信作为一种显式通信形式,研究如何直接从数据中创建人工语言以编码高任务效用的消息。然而,在多数情况下,涌现通信发送的消息压缩不足,携带极少甚至零信息,且可能无法被第三方监听者理解。本文提出一种基于信息瓶颈的无监督方法,在捕获指代复杂度与任务特定效用的同时,充分探索多智能体强化学习中的稀疏社会通信场景。我们证明该模型能够:i) 发展出一套受自然语言启发的词汇表,该词汇表由一组独立涌现的概念组成,以最小比特数涵盖观测与意图;ii) 通过通信对齐具有不同特征模型的异构智能体的行动策略;iii) 通过观察专家行动策略学习通信策略,我们将此过程称为“社会模仿”。