Sign language lexicographers construct bilingual dictionaries by establishing word-to-sign mappings, where polysemous and homonymous words corresponding to different signs across contexts are often underrepresented. A usage-based approach examining how word senses map to signs can identify such novel mappings absent from current dictionaries, enriching lexicographic resources. We address this by analyzing German and German Sign Language (Deutsche Gebärdensprache, DGS), manually annotating 1,404 word use-to-sign ID mappings derived from 32 words from the German Word Usage Graph (D-WUG) and 49 signs from the Digital Dictionary of German Sign Language (DW-DGS). We identify three correspondence types: Type 1 (one-to-many), Type 2 (many-to-one), and Type 3 (one-to-one), plus No Match cases. We evaluate computational methods: Exact Match (EM) and Semantic Similarity (SS) using SBERT embeddings. SS substantially outperforms EM overall 88.52% vs. 71.31%), with dramatic gains for Type 1 (+52.1 pp). Our work establishes the first annotated dataset for cross-modal sense correspondence and reveals which correspondence patterns are computationally identifiable. Our code and dataset are made publicly available.
翻译:手语词典编纂者通过建立词项到手势的映射来构建双语词典,其中多义词和同音异义词在不同语境下对应不同手势的情况往往未被充分表征。采用基于用法的研究方法考察词义如何映射到手势,能够识别出现有词典中缺失的新颖对应关系,从而丰富词典编纂资源。我们通过分析德语与德语手语(Deutsche Gebärdensprache, DGS)对此展开研究,手工标注了1,404个词项使用实例到手势ID的映射对,这些数据来源于德语用词图(D-WUG)中的32个词项以及德语手语数字词典(DW-DGS)中的49个手势。我们识别出三种对应类型:类型1(一对多)、类型2(多对一)与类型3(一对一),此外还存在无匹配的情况。我们评估了两种计算方法:精确匹配(EM)与基于SBERT嵌入的语义相似度匹配(SS)。语义相似度方法整体表现显著优于精确匹配(88.52% vs. 71.31%),在类型1对应中提升尤为显著(+52.1个百分点)。本研究建立了首个跨模态义项对应关系的标注数据集,并揭示了哪些对应模式可通过计算方式识别。我们的代码与数据集已公开提供。