Zipf's law of abbreviation, namely the tendency of more frequent words to be shorter, has been viewed as a manifestation of compression, i.e. the minimization of the length of forms -- a universal principle of natural communication. Although the claim that languages are optimized has become trendy, attempts to measure the degree of optimization of languages have been rather scarce. Here we present two optimality scores that are dualy normalized, namely, they are normalized with respect to both the minimum and the random baseline. We analyze the theoretical and statistical pros and cons of these and other scores. Harnessing the best score, we quantify for the first time the degree of optimality of word lengths in languages. This indicates that languages are optimized to 62 or 67 percent on average (depending on the source) when word lengths are measured in characters, and to 65 percent on average when word lengths are measured in time. In general, spoken word durations are more optimized than written word lengths in characters. Our work paves the way to measure the degree of optimality of the vocalizations or gestures of other species, and to compare them against written, spoken, or signed human languages.
翻译:齐普夫省力定律,即高频词倾向于更短的现象,被视为压缩的一种表现——即形式长度的最小化——这是自然交际的普遍原则。尽管语言优化已成为研究热点,但衡量语言优化程度的尝试仍相当稀少。本文提出两种双重归一化的最优性指标,即同时针对最小基线和随机基线进行归一化。我们分析了这些指标及其他指标的理论与统计优劣。通过运用最佳指标,我们首次量化了语言词长的最优程度。结果表明,当以字符数衡量词长时,语言平均优化程度达62%或67%(因来源而异);当以时长衡量词长时,平均优化程度达65%。总体而言,口语词长的优化程度高于书面语字符数词长。本工作为衡量其他物种发声或手势的最优程度奠定了基础,并为其与人类书面语、口语及手语的比较提供了途径。