Human languages provide efficient systems for expressing numerosities, but whether the sheer pressure to communicate is enough for numerical representations to arise in artificial agents, and whether the emergent codes resemble human numerals at all, remains an open question. We study two neural network-based agents that must communicate numerosities in a referential game using either discrete tokens or continuous sketches, thus exploring both symbolic and iconic representations. Without any pre-defined numeric concepts, the agents achieve high in-distribution communication accuracy in both communication channels and converge on high-precision symbol-meaning mappings. However, the emergent code is non-compositional: the agents fail to derive systematic messages for unseen numerosities, typically reusing the symbol of the highest trained numerosity (discrete), or collapsing extrapolated values onto a single sketch (continuous). We conclude that the communication pressure alone suffices for precise transmission of learned numerosities, but additional pressures are needed to yield compositional codes and generalisation abilities.
翻译:人类语言提供了表达数量的高效系统,但纯粹的通信压力是否足以使数值表征在人工智能中涌现,以及涌现的编码是否与人类数字系统存在任何相似性,仍是一个悬而未决的问题。我们研究了两个基于神经网络的智能体,它们必须在指称游戏中通过离散符号或连续草图来传递数量信息,从而同时探索符号性与图像性表征。在没有任何预定义数值概念的情况下,两个智能体在两种通信通道中均实现了较高的分布内通信准确率,并收敛于高精度的符号-意义映射关系。然而,涌现的编码系统不具备组合性:智能体无法为未见过的数量生成系统化的消息,通常重复使用训练过的最大数量的符号(离散通道),或将外推值坍缩为单一草图(连续通道)。我们的结论是,仅凭通信压力足以实现已学习数量的精确传递,但要产生组合性编码和泛化能力,则需要额外的压力驱动。