A fundamental concern in linguistics has been to understand how languages change, such as in relation to word order. Since the order of words in a sentence (i.e. the relative placement of Subject, Object, and Verb) is readily identifiable in most languages, this has been a productive field of study for decades (see Greenberg 1963; Dryer 2007; Hawkins 2014). However, a language's word order can change over time, with competing explanations for such changes (Carnie and Guilfoyle 2000; Crisma and Longobardi 2009; Martins and Cardoso 2018; Dunn et al. 2011; Jager and Wahle 2021). This paper proposes a general universal explanation for word order change based on a theory of communicative interaction (the Min-Max theory of language behavior) in which agents seek to minimize effort while maximizing information. Such an account unifies opposing findings from language processing (Piantadosi et al. 2011; Wasow 2022; Levy 2008) that make different predictions about how word order should be realized crosslinguistically. The marriage of both "efficiency" and "surprisal" approaches under the Min-Max theory is justified with evidence from a massive dataset of 1,942 language corpora tagged for parts of speech (Ring 2025), in which average lengths of particular word classes correlates with word order, allowing for prediction of basic word order from diverse corpora. The general universal pressure of word class length in corpora is shown to give a stronger explanation for word order realization than either genealogical or areal factors, highlighting the importance of language corpora for investigating such questions.
翻译:语言学的一个基本关切在于理解语言如何变化,例如在词序方面。由于句子中的词序(即主语、宾语和谓语的相对位置)在大多数语言中易于识别,这已成为数十年来成果丰硕的研究领域(参见 Greenberg 1963;Dryer 2007;Hawkins 2014)。然而,语言的词序会随时间变化,对此类变化的解释存在竞争性观点(Carnie and Guilfoyle 2000;Crisma and Longobardi 2009;Martins and Cardoso 2018;Dunn et al. 2011;Jager and Wahle 2021)。本文基于交际互动理论(语言行为的最小-最大化理论)提出词序变化的普适性解释,该理论认为交际主体在追求信息最大化的同时力求努力最小化。这一解释统一了语言处理研究中相互对立的发现(Piantadosi et al. 2011;Wasow 2022;Levy 2008),这些研究对跨语言词序实现方式做出了不同预测。最小-最大化理论将"效率"与"惊异值"两种研究路径相结合,其合理性通过一个包含1,942种语言语料库的大规模数据集(Ring 2025)得到验证——该数据集已完成词性标注,其中特定词类的平均长度与词序存在相关性,使得从多样化语料库预测基本词序成为可能。研究证明,语料库中词类长度的普适性压力对词序实现的解释力强于谱系或地域因素,这凸显了语言语料库在研究此类问题中的重要性。