As large pre-trained image-processing neural networks are being embedded in autonomous agents such as self-driving cars or robots, the question arises of how such systems can communicate with each other about the surrounding world, despite their different architectures and training regimes. As a first step in this direction, we systematically explore the task of \textit{referential communication} in a community of heterogeneous state-of-the-art pre-trained visual networks, showing that they can develop, in a self-supervised way, a shared protocol to refer to a target object among a set of candidates. This shared protocol can also be used, to some extent, to communicate about previously unseen object categories of different granularity. Moreover, a visual network that was not initially part of an existing community can learn the community's protocol with remarkable ease. Finally, we study, both qualitatively and quantitatively, the properties of the emergent protocol, providing some evidence that it is capturing high-level semantic features of objects.
翻译:随着大型预训练图像处理神经网络被嵌入自动驾驶汽车或机器人等自主智能体,一个关键问题随之产生:尽管这些系统具有不同的架构和训练机制,它们如何就周围世界进行相互通信?作为该方向的第一步,我们系统性地探索了在异构顶尖预训练视觉网络群落中的"指代通信"任务,表明它们能够以自监督方式发展出一套共享协议,用以在候选对象集合中指代一个目标物体。该共享协议在一定程度上还可用于交流此前未见过的不同粒度物体类别。此外,原本不属于现有群落的视觉网络能够以显著易学性掌握该群落的协议。最后,我们从定性和定量两个维度研究了这一涌现协议的特性,提供证据表明其正在捕捉物体的高层语义特征。