Visual Referential Games Further the Emergence of Disentangled Representations

Natural languages are powerful tools wielded by human beings to communicate information. Among their desirable properties, compositionality has been the main focus in the context of referential games and variants, as it promises to enable greater systematicity to the agents which would wield it. The concept of disentanglement has been shown to be of paramount importance to learned representations that generalise well in deep learning, and is thought to be a necessary condition to enable systematicity. Thus, this paper investigates how do compositionality at the level of the emerging languages, disentanglement at the level of the learned representations, and systematicity relate to each other in the context of visual referential games. Firstly, we find that visual referential games that are based on the Obverter architecture outperforms state-of-the-art unsupervised learning approach in terms of many major disentanglement metrics. Secondly, we expand the previously proposed Positional Disentanglement (PosDis) metric for compositionality to (re-)incorporate some concerns pertaining to informativeness and completeness features found in the Mutual Information Gap (MIG) disentanglement metric it stems from. This extension allows for further discrimination between the different kind of compositional languages that emerge in the context of Obverter-based referential games, in a way that neither the referential game accuracy nor previous metrics were able to capture. Finally we investigate whether the resulting (emergent) systematicity, as measured by zero-shot compositional learning tests, correlates with any of the disentanglement and compositionality metrics proposed so far. Throughout the training process, statically significant correlation coefficients can be found both positive and negative depending on the moment of the measure.

翻译：自然语言是人类用于信息交流的强大工具。在其诸多理想特性中，组合性一直是指称游戏及其变体研究的重点，因为它有望使掌握该特性的智能体具备更强大的系统性。研究表明，解缠对于深度学习中的可泛化学习表示至关重要，且被视为实现系统性的必要条件。因此，本文探讨在视觉指称游戏背景下，新兴语言的组合性、学习表示的解缠性以及系统性三者之间的关联。首先，我们发现基于Obverter架构的视觉指称游戏在多个主流解缠指标上优于最先进的无监督学习方法。其次，我们将先前提出的用于衡量组合性的位置解缠（PosDis）指标进行扩展，（重新）纳入其源于互信息差距（MIG）解缠指标中关于信息性和完整性的考量。这一扩展使得在基于Obverter的指称游戏背景下，能够进一步区分不同类型组合语言的特征，而这种区分既无法通过指称游戏准确率实现，也无法被先前的指标捕捉。最后，我们探究由零样本组合学习测试衡量的（涌现）系统性是否与迄今提出的解缠和组合性指标存在相关性。在整个训练过程中，根据测量时刻的不同，可发现统计学显著的正向或负向相关系数。