Recent research shows that punctuation patterns in texts exhibit universal features across languages. Analysis of Western classical literature reveals that the distribution of spaces between punctuation marks aligns with a discrete Weibull distribution, typically used in survival analysis. By extending this analysis to Chinese literature represented here by three notable contemporary works, it is shown that Zipf's law applies to Chinese texts similarly to Western texts, where punctuation patterns also improve adherence to the law. Additionally, the distance distribution between punctuation marks in Chinese texts follows the Weibull model, though larger spacing is less frequent than in English translations. Sentence-ending punctuation, representing sentence length, diverges more from this pattern, reflecting greater flexibility in sentence length. This variability supports the formation of complex, multifractal sentence structures, particularly evident in Gao Xingjian's "Soul Mountain". These findings demonstrate that both Chinese and Western texts share universal punctuation and word distribution patterns, underscoring their broad applicability across languages.
翻译:近期研究表明,文本中的标点符号模式在不同语言间呈现出普遍性特征。对西方古典文学作品的分析揭示,标点符号间间隔的分布符合离散威布尔分布——该分布通常应用于生存分析。通过将这一分析扩展至以三部著名当代作品为代表的中国文学作品,本文证明齐普夫定律在中国文本中同样适用,且标点符号模式能进一步提升对该定律的契合度。此外,中文文本中标点符号间距的分布遵循威布尔模型,但其较大间距的出现频率低于英文译本。代表句子长度的句末标点符号与该模型的偏离更为显著,这反映了汉语句长具有更高的灵活性。这种变异性支持了复杂多重分形句子结构的形成,在高行健的《灵山》中体现得尤为明显。这些发现表明,中西方文本在标点符号与词汇分布模式上具有共性,印证了此类规律在不同语言间的广泛适用性。