Burrows' Delta was introduced in 2002 and has proven to be an effective tool for author attribution. Despite the fact that these are different languages, they mostly belong to the same grammatical type and use the same graphic principle to convey speech in writing: a phonemic alphabet with word separation using spaces. The question I want to address in this article is how well this attribution method works with texts in a language with a different grammatical structure and a script based on different principles. There are fewer studies analyzing the effectiveness of the Delta method on Chinese texts than on texts in European languages. I believe that such a low level of attention to Delta from sinologists is due to the structure of the scientific field dedicated to medieval Chinese poetry. Clustering based on intertextual distances worked flawlessly. Delta produced results where clustering showed that the samples of one author were most similar to each other, and Delta never confused different poets. Despite the fact that I used an unconventional approach and applied the Delta method to a language poorly suited for it, the method demonstrated its effectiveness. Tang dynasty poets are correctly identified using Delta, and the empirical pattern observed for authors writing in European standard languages has been confirmed once again.
翻译:伯罗斯Delta方法于2002年被提出,已被证明是一种有效的作者归属判定工具。尽管这些语言各不相同,但它们大多属于相同的语法类型,并采用相同的书写原则来记录言语:即使用音素字母表并通过空格进行分词。本文旨在探讨该方法在语法结构不同、文字系统原理迥异的语言文本中的适用效果。相较于欧洲语言文本,针对汉语文本分析Delta方法有效性的研究相对匮乏。笔者认为,汉学界对该方法关注度较低的原因在于中世纪汉语诗歌研究领域的学术结构特点。基于互文距离的聚类分析取得了精确的结果:Delta方法产生的聚类显示,同一作者的样本之间相似度最高,且从未混淆不同诗人。尽管本研究采用了非常规思路,将Delta方法应用于适配度较低的语言体系,但该方法仍展现出显著的有效性。通过Delta方法能够准确识别唐代诗人,这再次证实了在欧洲标准语言作者分析中观察到的经验规律。