It has previously been shown that by using reinforcement learning (RL), agents can derive simple approximate and exact-restricted numeral systems that are similar to human ones (Carlsson, 2021). However, it is a major challenge to show how more complex recursive numeral systems, similar to for example English, could arise via a simple learning mechanism such as RL. Here, we introduce an approach towards deriving a mechanistic explanation of the emergence of efficient recursive number systems. We consider pairs of agents learning how to communicate about numerical quantities through a meta-grammar that can be gradually modified throughout the interactions. %We find that the seminal meta-grammar of Hurford (Hurford, 1975) is not suitable for this application as its optimization results in systems that deviate from standard conventions observed within human numeral systems. We propose a simple modification which addresses this issue. Utilising a slightly modified version of the meta-grammar of Hurford, we demonstrate that our RL agents, shaped by the pressures for efficient communication, can effectively modify their lexicon towards Pareto-optimal configurations which are comparable to those observed within human numeral systems in terms of their efficiency.
翻译:先前研究表明,通过使用强化学习(RL),智能体可以推导出类似于人类使用的简单近似及精确受限数字系统(Carlsson, 2021)。然而,如何通过RL这类简单学习机制产生更复杂的递归数字系统(例如类似于英语的系统)是一个重大挑战。本文提出一种方法,旨在对高效递归数字系统的涌现机制进行解释。我们考虑成对的智能体通过学习如何通过一种可在交互过程中逐步修改的元语法来交流数值信息。%我们发现Hurford(Hurford, 1975)提出的经典元语法不适用于此应用,因其优化产生的系统偏离了人类数字系统中观察到的标准惯例。我们提出一个简单的修改方案以解决此问题。利用稍加修改的Hurford元语法版本,我们证明在高效交流的压力塑造下,我们的RL智能体能够有效地将其词汇表修改为帕累托最优配置,这些配置在效率方面可与人类数字系统中观察到的系统相媲美。