The Strahler number was originally proposed to characterize the complexity of river bifurcation and has found various applications. This article proposes computation of the Strahler number's upper and lower limits for natural language sentence tree structures. Through empirical measurements across grammatically annotated data, the Strahler number of natural language sentences is shown to be almost 3 or 4, similarly to the case of river bifurcation as reported by Strahler (1957). From the theory behind the number, we show that it is one kind of lower limit on the amount of memory required to process sentences. We consider the Strahler number to provide reasoning that explains reports showing that the number of required memory areas to process sentences is 3 to 4 for parsing (Abney and Johnson, 1991; Schuler et al., 2010), and reports indicating a psychological "magical number" of 3 to 5 (Cowan, 2001). An analytical and empirical analysis shows that the Strahler number is not constant but grows logarithmically; therefore, the Strahler number of sentences derives from the range of sentence lengths. Furthermore, the Strahler number is not different for random trees, which could suggest that its origin is not specific to natural language.
翻译:斯特拉勒数最初用于表征河流分支的复杂度,现已发展出多种应用。本文提出计算自然语言语句树结构中斯特拉勒数的上下限方法。通过标注语法数据的实证测量,自然语言语句的斯特拉勒数显示约为3或4,这与Strahler(1957)报道的河流分支情况相似。基于该数的理论分析表明,它是处理语句所需内存容量的一种下限。我们认为斯特拉勒数为以下发现提供了理论依据:解析语句所需的内存区域数量为3到4(Abney和Johnson, 1991; Schuler等, 2010),以及心理学研究中提出的"神奇数字"3到5(Cowan, 2001)。分析与实证研究表明,斯特拉勒数并非恒定值,而是呈对数增长;因此,语句的斯特拉勒数源自句长的分布范围。此外,随机树的斯特拉勒数与之并无差异,这可能暗示其起源并非自然语言所特有。