Languages are known to describe the world in diverse ways. Across lexicons, diversity is pervasive, appearing through phenomena such as lexical gaps and untranslatability. However, in computational resources, such as multilingual lexical databases, diversity is hardly ever represented. In this paper, we introduce a method to enrich computational lexicons with content relating to linguistic diversity. The method is verified through two large-scale case studies on kinship terminology, a domain known to be diverse across languages and cultures: one case study deals with seven Arabic dialects, while the other one with three Indonesian languages. Our results, made available as browseable and downloadable computational resources, extend prior linguistics research on kinship terminology, and provide insight into the extent of diversity even within linguistically and culturally close communities.
翻译:语言已知以多样的方式描述世界。在词汇层面,这种多样性普遍存在,表现为词汇空缺和不可翻译性等现象。然而,在多语言词汇数据库等计算资源中,这种多样性几乎未被呈现。本文提出一种方法,用以丰富计算词汇中含有语言多样性相关内容。该方法通过两项关于亲属关系术语的大规模案例研究得到验证——该领域在不同语言和文化中已知具有多样性:一项研究涉及七种阿拉伯方言,另一项涉及三种印度尼西亚语言。我们的研究成果以可浏览和可下载的计算资源形式公开,扩展了先前关于亲属关系术语的语言学研究,并揭示了即使在语言和文化相近的社群内部,也存在显著程度的多样性。