This study addresses the task of Unknown Sense Detection in English and Swedish. The primary objective of this task is to determine whether the meaning of a particular word usage is documented in a dictionary or not. For this purpose, sense entries are compared with word usages from modern and historical corpora using a pre-trained Word-in-Context embedder that allows us to model this task in a few-shot scenario. Additionally, we use human annotations on the target corpora to adapt hyperparameters and evaluate our models using 5-fold cross-validation. Compared to a random sample from a corpus, our model is able to considerably increase the detected number of word usages with non-recorded senses.
翻译:本研究探讨了英语和瑞典语中的未知词义检测任务。该任务的主要目标是判断特定词语用法的含义是否在词典中有记载。为此,我们使用预训练的上下文词嵌入器,将词典中的词义条目与现代及历史语料库中的词语用法进行比较,从而在少样本场景下对该任务进行建模。此外,我们利用目标语料库的人工标注来调整超参数,并通过五折交叉验证评估模型性能。与从语料库中随机抽样相比,我们的模型能够显著增加检测到的具有未记录词义的词语用法数量。