All poetic forms come from somewhere. Prosodic templates can be copied for generations, altered by individuals, imported from foreign traditions, or fundamentally changed under the pressures of language evolution. Yet these relationships are notoriously difficult to trace across languages and times. This paper introduces an unsupervised method for detecting structural similarities in poems using local sequence alignment. The method relies on encoding poetic texts as strings of prosodic features using a four-letter alphabet; these sequences are then aligned to derive a distance measure based on weighted symbol (mis)matches. Local alignment allows poems to be clustered according to emergent properties of their underlying prosodic patterns. We evaluate method performance on a meter recognition tasks against strong baselines and show its potential for cross-lingual and historical research using three short case studies: 1) mutations in quantitative meter in classical Latin, 2) European diffusion of the Renaissance hendecasyllable, and 3) comparative alignment of modern meters in 18--19th century Czech, German and Russian. We release an implementation of the algorithm as a Python package with an open license.
翻译:所有诗歌形式皆有渊源。韵律模板可代代相传、被个人改造、从外国传统引进,或在语言演化压力下发生根本转变。然而,这些关系在跨语言和跨时代背景下极难追溯。本文提出一种无监督方法,通过局部序列比检测诗歌的结构相似性。该方法利用四字母字母表将诗歌文本编码为韵律特征字符串,随后通过比对这些序列,基于加权符号(不)匹配推导距离度量。局部比对使得诗歌可根据其潜在韵律模式的涌现特征进行聚类。我们在韵律识别任务上对方法性能进行评估,并与强基线进行比较,通过三个简短案例研究展示其在跨语言和历史研究中的潜力:1)古典拉丁语定量韵律的变异,2)文艺复兴十一音节诗在欧洲的传播,3)18–19世纪捷克语、德语和俄语现代韵律的比较分析。我们以开放许可协议发布了该算法的Python包实现。