Words are fundamental linguistic units that connect thoughts and things through meaning. However, words do not appear independently in a text sequence. The existence of syntactic rules induces correlations among neighboring words. Using an ordinal pattern approach, we present an analysis of lexical statistical connections for 11 major languages. We find that the diverse manners that languages utilize to express word relations give rise to unique pattern structural distributions. Furthermore, fluctuations of these pattern distributions for a given language can allow us to determine both the historical period when the text was written and its author. Taken together, our results emphasize the relevance of ordinal time series analysis in linguistic typology, historical linguistics and stylometry.
翻译:词汇是通过意义将思想与事物连接起来的基本语言单位。然而,在文本序列中,词汇并非独立出现。句法规则的存在会导致相邻词汇之间产生关联。我们采用序数模式方法,对11种主要语言的词汇统计关联进行了分析。研究发现,不同语言在表达词际关系时所采用的不同方式,形成了独特的模式结构分布。此外,特定语言中这些模式分布的波动,使我们能够确定文本的创作历史时期及其作者。综合而言,我们的研究结果强调了序数时间序列分析在语言类型学、历史语言学和文风计量学中的相关性。