A key function of the lexicon is to express novel concepts as they emerge over time through a process known as lexicalization. The most common lexicalization strategies are the reuse and combination of existing words, but they have typically been studied separately in the areas of word meaning extension and word formation. Here we offer an information-theoretic account of how both strategies are constrained by a fundamental tradeoff between competing communicative pressures: word reuse tends to preserve the average length of word forms at the cost of less precision, while word combination tends to produce more informative words at the expense of greater word length. We test our proposal against a large dataset of reuse items and compounds that appeared in English, French and Finnish over the past century. We find that these historically emerging items achieve higher levels of communicative efficiency than hypothetical ways of constructing the lexicon, and both literal reuse items and compounds tend to be more efficient than their non-literal counterparts. These results suggest that reuse and combination are both consistent with a unified account of lexicalization grounded in the theory of efficient communication.
翻译:词汇的一项关键功能是通过词汇化过程随时间推移表达新兴概念。最常见的词汇化策略是复用和组合现有词汇,但这两个策略通常分别在词义扩展和构词法领域进行研究。本文从信息论角度阐释了这两种策略如何受到基本权衡关系的制约——这种权衡源于相互竞争的交际压力:词汇复用倾向于保持词形的平均长度,但会降低表达精度;而词汇组合则倾向于生成信息量更大的词汇,但需要付出增加词长的代价。我们使用英语、法语和芬兰语在过去一个世纪中出现的复用词与复合词大规模数据集验证了这一理论。研究发现,这些历史进程中涌现的词汇项目比假设的词汇构建方式实现了更高的交际效率,且无论是字面意义的复用词还是复合词,其效率通常都高于非字面对应形式。这些结果表明,复用与组合策略均符合以高效交际理论为基础的统一词汇化解释框架。