The emergence of mathematical concepts, such as number systems, is an understudied area in AI for mathematics and reasoning. It has previously been shown Carlsson et al. (2021) that by using reinforcement learning (RL), agents can derive simple approximate and exact-restricted numeral systems. However, it is a major challenge to show how more complex recursive numeral systems, similar to the one utilised in English, could arise via a simple learning mechanism such as RL. Here, we introduce an approach towards deriving a mechanistic explanation of the emergence of recursive number systems where we consider an RL agent which directly optimizes a lexicon under a given meta-grammar. Utilising a slightly modified version of the seminal meta-grammar of Hurford (1975), we demonstrate that our RL agent can effectively modify the lexicon towards Pareto-optimal configurations which are comparable to those observed within human numeral systems.
翻译:数学概念(如数制系统)的出现是人工智能在数学与推理领域一个尚未充分研究的课题。先前的研究(Carlsson 等人,2021)表明,通过使用强化学习,智能体可以推导出简单的近似数制系统和精确受限数制系统。然而,如何通过强化学习这类简单学习机制产生更复杂的递归数制系统(类似于英语中使用的系统)仍是一个重大挑战。本文提出一种方法,旨在为递归数制系统的出现提供一种机制性解释:我们考虑一个在给定元语法下直接优化词库的强化学习智能体。利用 Hurford(1975)开创性元语法的轻微修改版本,我们证明该强化学习智能体能够有效调整词库,使其达到与人类数制系统中观察到的配置相当的帕累托最优配置。