Human language, while aimed at conveying meaning, inherently carries ambiguity. It poses challenges for speech and language processing, but also serves crucial communicative functions. Efficiently solve ambiguity is both a desired and a necessary characteristic. The lexical meaning of a word in context can be determined automatically by Word Sense Disambiguation (WSD) algorithms that rely on external knowledge often limited and biased toward English. When adapting content to other languages, automated translations are frequently inaccurate and a high degree of expert human validation is necessary to ensure both accuracy and understanding. The current study addresses previous limitations by introducing a new resource for Spanish WSD. It includes a sense inventory and a lexical dataset sourced from the Diccionario de la Lengua Espa\~nola which is maintained by the Real Academia Espa\~nola. We also review current resources for Spanish and report metrics on them by a state-of-the-art system.
翻译:人类语言虽旨在传达意义,却天然带有歧义性。这给语音与语言处理带来挑战,同时也承担着关键的交际功能。高效解决歧义既是理想特性也是必然需求。词语在上下文中的词汇意义可通过词义消歧算法自动确定,但这些算法依赖的外部知识通常有限且偏向英语。在将内容适配其他语言时,自动翻译常出现误差,需要大量专家人工校验以确保准确性与可理解性。本研究通过引入西班牙语词义消歧新资源,解决了先前研究的局限。该资源包含源自西班牙皇家学院维护的《西班牙语词典》的义项库与词汇数据集。我们还系统评述了现有西班牙语词义消歧资源,并采用前沿系统对其进行了量化评估。