The Strahler number was originally proposed to characterize the complexity of river bifurcation and has found various applications. This article proposes computation of the Strahler number's upper and lower limits for natural language sentence tree structures. Through empirical measurements across grammatically annotated data, the Strahler number of natural language sentences is shown to be almost 3 or 4, similarly to the case of river bifurcation as reported by Strahler (1957). From the theory behind the number, we show that it is one kind of lower limit on the amount of memory required to process sentences. We consider the Strahler number to provide reasoning that explains reports showing that the number of required memory areas to process sentences is 3 to 4 for parsing (Schuler et al., 2010), and reports indicating a psychological "magical number" of 3 to 5 (Cowan, 2001). An analytical and empirical analysis shows that the Strahler number is not constant but grows logarithmically; therefore, the Strahler number of sentences derives from the range of sentence lengths. Furthermore, the Strahler number is not different for random trees, which could suggest that its origin is not specific to natural language.
翻译:Strahler数最初提出用于刻画河流分支的复杂度,目前已发现多种应用场景。本文提出自然语言句子树结构的Strahler数上下限计算方法。基于语法标注数据的经验测量表明,自然语言句子的Strahler数接近3或4,与Strahler(1957)报告的河流分支情况类似。根据该数的理论基础,我们证明它是处理句子所需记忆量的一种下限。我们认为Strahler数可解释关于解析过程所需记忆区域数量为3到4的研究报告(Schuler等,2010),以及心理学上3到5的"神奇数字"研究(Cowan,2001)。分析和经验研究显示,Strahler数并非恒定不变,而是呈对数增长;因此,句子的Strahler数源于句子长度的范围。此外,随机树的Strahler数并无差异——这可能表明其起源并非自然语言所特有。