Languages can encode temporal subordination lexically, via subordinating conjunctions, and morphologically, by marking the relation on the predicate. Systematic cross-linguistic variation among the former can be studied using well-established token-based typological approaches to token-aligned parallel corpora. Variation among different morphological means is instead much harder to tackle and therefore more poorly understood, despite being predominant in several language groups. This paper explores variation in the expression of generic temporal subordination ('when'-clauses) among the languages of Latin America and the Caribbean, where morphological marking is particularly common. It presents probabilistic semantic maps computed on the basis of the languages of the region, thus avoiding bias towards the many world's languages that exclusively use lexified connectors, incorporating associations between character $n$-grams and English $when$. The approach allows capturing morphological clause-linkage devices in addition to lexified connectors, paving the way for larger-scale, strategy-agnostic analyses of typological variation in temporal subordination.
翻译:语言可通过从属连词以词汇方式编码时间从属关系,亦可通过在谓语上标记关系以形态方式实现。前者系统性的跨语言差异,可利用成熟的双语对齐语料库的基于词符类型学方法进行研究。而不同形态手段的变异则更难以处理,因此理解也相对薄弱,尽管其在多个语系中占据主导地位。本文探讨拉丁美洲及加勒比地区语言中通用时间从属关系(“when”从句)表达的差异,该地区形态标记尤为常见。基于区域语言的计算概率语义图谱被提出,从而避免对全球许多仅使用词汇化连接词的语言产生偏见,并整合了字符n元语法与英语when的关联。该方法不仅能捕捉词汇化连接词,还能识别形态化从句连接手段,为更大规模、策略无关的时间从属关系类型变异分析铺平道路。