Languages are known to describe the world in diverse ways. Across lexicons, diversity is pervasive, appearing through phenomena such as lexical gaps and untranslatability. However, in computational resources, such as multilingual lexical databases, diversity is hardly ever represented. In this paper, we introduce a method to enrich computational lexicons with content relating to linguistic diversity. The method is verified through two large-scale case studies on kinship terminology, a domain known to be diverse across languages and cultures: one case study deals with seven Arabic dialects, while the other one with three Indonesian languages. Our results, made available as browseable and downloadable computational resources, extend prior linguistics research on kinship terminology, and provide insight into the extent of diversity even within linguistically and culturally close communities.
翻译:语言以多样方式描述世界这一现象广为人知。在词汇层面,多样性普遍存在,体现为词汇空缺及不可译性等现象。然而,诸如多语言词汇数据库等计算资源中,这种多样性鲜有呈现。本文提出一种方法,用以在计算词汇资源中丰富语言多样性内容。通过两项关于亲属称谓术语的大规模案例研究(该领域已知在不同语言文化中具有显著差异性)验证该方法:一项涉及七种阿拉伯方言,另一项涵盖三种印尼语。我们的研究成果以可浏览可下载的计算资源形式开放,拓展了亲属称谓术语的既有语言学研究,并揭示出即便在语言文化相近的社群内部,多样性程度依然显著。