The world's languages exhibit certain so-called typological or implicational universals; for example, Subject-Object-Verb (SOV) word order typically employs postpositions. Explaining the source of such biases is a key goal in linguistics. We study the word-order universals through a computational simulation with language models (LMs). Our experiments show that typologically typical word orders tend to have lower perplexity estimated by LMs with cognitively plausible biases: syntactic biases, specific parsing strategies, and memory limitations. This suggests that the interplay of these cognitive biases and predictability (perplexity) can explain many aspects of word-order universals. This also showcases the advantage of cognitively-motivated LMs, which are typically employed in cognitive modeling, in the computational simulation of language universals.
翻译:世界语言呈现出某些所谓的类型学或蕴涵普遍性;例如,主-宾-动(SOV)语序通常使用后置介词。解释这类偏向的来源是语言学的一个关键目标。我们通过语言模型的计算模拟来研究词序普遍性。实验表明,具有认知合理偏向(包括句法偏向、特定解析策略和记忆限制)的语言模型所估计的困惑度,在类型学典型的词序中往往更低。这表明这些认知偏向与可预测性(困惑度)之间的相互作用可以解释词序普遍性的诸多方面。同时也展现了通常用于认知建模的认知驱动语言模型在语言普遍性计算模拟中的优势。