Why is the past of English "go" the apparently unrelated "went"? Such alternations are frequent in languages. They neither aid communication nor learnability, yet they can be persistent, surviving over centuries or millennia. We present a multi-agent simulation of the emergence of morphological stem and inflection alternations. Alternate forms arise by phonological changes or, as with "go/went", from lexical alternatives associated with a subset of the population. When an agent 'hears' another agent use a novel form for a slot in the paradigm of a word (say, the past tense of go), they will with some probability adopt that form, possibly spreading its use to other slots in the paradigm that shared the same original form. Thus alternative forms can spread through the population and become entrenched as stem or inflectional marker alternants. Unlike many previous computational studies, our system allows for naturalistic lexical forms, realistic phonological rules, lexicons with hundreds or thousands of entries, and agent populations in the tens or hundreds. It supports several network topologies, diffusion patterns and agent adoption policies. One issue with such simulations is evaluation: how realistic is the resulting morphology compared to those of real languages? We introduce the AI Historical Linguist, a novel Large Language Model-driven system that models a debate between two historical linguists. We use this to compare a set of real language morphologies, disguised morphologies, and experimentally evolved morphologies. The results suggest that among the factors that favor more plausible morphologies are scale-free social networks and random Bernoulli adoption of forms. We also present three case studies modeling attested historical changes, allowing us to test what might have happened if history had been different. All code and data are released.
翻译:英语动词“go”的过去式为何是看似无关的“went”?这类交替现象在语言中屡见不鲜。它们既无助于交流,也不利于习得,却能持续存在数百年甚至数千年之久。我们提出了一种多主体模拟方法,用于研究形态学词干与屈折交替现象的产生机制。交替形式源自语音变化,或如"go/went"这类与特定人群子集相关的词汇变体。当某主体"听到"另一主体对某个词形变化范式中的空位(如动词go的过去时)使用新形式时,该主体会以一定概率采纳这一形式,并可能将其传播至范式中原本共享同一原始形式的其他空位。由此,替代形式可在人群中扩散,并逐渐固化为词干或屈折标记的交替变体。与过往许多计算研究不同,我们的系统支持拟真词汇形式、现实语音规则、含数百至数千词条的词库,以及数十至数百规模的主体群体。该系统兼容多种网络拓扑结构、扩散模式及主体采纳策略。此类模拟面临的核心问题在于评估:其产生的形态系统相较于真实语言究竟有多真实?我们提出了"AI历史语言学家"——一种基于大语言模型的新型驱动系统,通过模拟两位历史语言学家的学术辩论,对一组真实语言形态系统、伪装形态系统及实验演化形态系统进行比较分析。结果表明,无标度社交网络与伯努利随机形式采纳机制是催生更逼真形态系统的关键因素。我们还通过三个案例研究,模拟了已证实的历实变化过程,得以检验倘若历史轨迹不同可能产生的结果。所有代码与数据均已开源发布。