An unaddressed challenge in multi-agent coordination is to enable AI agents to exploit the semantic relationships between the features of actions and the features of observations. Humans take advantage of these relationships in highly intuitive ways. For instance, in the absence of a shared language, we might point to the object we desire or hold up our fingers to indicate how many objects we want. To address this challenge, we investigate the effect of network architecture on the propensity of learning algorithms to exploit these semantic relationships. Across a procedurally generated coordination task, we find that attention-based architectures that jointly process a featurized representation of observations and actions have a better inductive bias for learning intuitive policies. Through fine-grained evaluation and scenario analysis, we show that the resulting policies are human-interpretable. Moreover, such agents coordinate with people without training on any human data.
翻译:在多智能体协调中,一个未解决的关键挑战是使人工智能智能体能够利用动作特征与观察特征之间的语义关系。人类会以高度直觉的方式利用这些关系。例如,在没有共享语言的情况下,我们可能会指向想要的物体,或举起手指示意所需物体的数量。为应对这一挑战,我们研究了网络架构对学习算法利用这些语义关系倾向的影响。通过一个程序生成的协调任务,我们发现联合处理观察与动作特征化表示的基于注意力的架构,在习得直觉策略方面具有更优的归纳偏置。通过细粒度评估与场景分析,我们证明了所得到的策略具备人类可解释性。此外,此类智能体无需任何人类数据训练即可与人进行协调。