The past years have seen a drastic rise in studies devoted to the investigation of colexification patterns in individual languages families in particular and the languages of the world in specific. Specifically computational studies have profited from the fact that colexification as a scientific construct is easy to operationalize, enabling scholars to infer colexification patterns for large collections of cross-linguistic data. Studies devoted to partial colexifications -- colexification patterns that do not involve entire words, but rather various parts of words--, however, have been rarely conducted so far. This is not surprising, since partial colexifications are less easy to deal with in computational approaches and may easily suffer from all kinds of noise resulting from false positive matches. In order to address this problem, this study proposes new approaches to the handling of partial colexifications by (1) proposing new models with which partial colexification patterns can be represented, (2) developing new efficient methods and workflows which help to infer various types of partial colexification patterns from multilingual wordlists, and (3) illustrating how inferred patterns of partial colexifications can be computationally analyzed and interactively visualized.
翻译:近年来,针对特定语系乃至全球语言中共词词模式的研究急剧增加。特别是计算研究受益于共词词作为一种科学概念易于操作化的特点,使学者能够从大规模跨语言数据中推断共词词模式。然而,针对部分共词词模式——即不涉及整个单词而仅涉及单词部分成分的共词词模式——的研究至今仍鲜有开展。这并不令人意外,因为部分共词词模式在计算方法中处理难度较大,且容易因错误阳性匹配而产生各类噪声干扰。为解决这一问题,本研究提出了处理部分共词词模式的新方法:(1) 提出可用于表征部分共词词模式的新模型;(2) 开发有助于从多语言词表中推断各类部分共词词模式的新颖高效方法与流程;(3) 阐明如何通过计算分析与交互式可视化手段对推断出的部分共词词模式进行研究。