As large pre-trained image-processing neural networks are being embedded in autonomous agents such as self-driving cars or robots, the question arises of how such systems can communicate with each other about the surrounding world, despite their different architectures and training regimes. As a first step in this direction, we systematically explore the task of \textit{referential communication} in a community of heterogeneous state-of-the-art pre-trained visual networks, showing that they can develop, in a self-supervised way, a shared protocol to refer to a target object among a set of candidates. This shared protocol can also be used, to some extent, to communicate about previously unseen object categories of different granularity. Moreover, a visual network that was not initially part of an existing community can learn the community's protocol with remarkable ease. Finally, we study, both qualitatively and quantitatively, the properties of the emergent protocol, providing some evidence that it is capturing high-level semantic features of objects.
翻译:随着大型预训练图像处理神经网络被嵌入自动驾驶汽车或机器人等自主智能体中,一个关键问题随之浮现:这些系统尽管架构和训练方式各异,如何能就周围世界进行相互通信?作为该方向的第一步,我们系统性地探索了异构最先进预训练视觉网络社区中的指称通信任务,结果表明这些网络可通过自监督方式发展出一套共享协议,用于在一组候选目标中指定特定对象。该共享协议在一定程度上还能用于交流先前未见过的不同粒度对象类别。此外,原本不属于现有社区的视觉网络能够以显著容易的方式学习该社区的协议。最后,我们从定性和定量两个维度研究了这种涌现协议的特性,提供了其能够捕捉对象高层语义特征的证据。