Visual Word Sense Disambiguation (VWSD) is a task to find the image that most accurately depicts the correct sense of the target word for the given context. Previously, image-text matching models often suffered from recognizing polysemous words. This paper introduces an unsupervised VWSD approach that uses gloss information of an external lexical knowledge-base, especially the sense definitions. Specifically, we suggest employing Bayesian inference to incorporate the sense definitions when sense information of the answer is not provided. In addition, to ameliorate the out-of-dictionary (OOD) issue, we propose a context-aware definition generation with GPT-3. Experimental results show that the VWSD performance significantly increased with our Bayesian inference-based approach. In addition, our context-aware definition generation achieved prominent performance improvement in OOD examples exhibiting better performance than the existing definition generation method. We will publish source codes as soon as possible.
翻译:视觉词义消歧(VWSD)是一项任务,旨在为给定上下文找出最准确描绘目标词正确含义的图像。以往,图像-文本匹配模型常难以识别多义词。本文提出一种无监督的VWSD方法,利用外部词汇知识库中的释义信息,特别是词义定义。具体而言,我们建议在未提供答案的词义信息时,采用贝叶斯推理融合词义定义。此外,为缓解词典外(OOD)问题,我们提出一种基于GPT-3的上下文感知定义生成方法。实验结果表明,基于贝叶斯推理的方法显著提升了VWSD性能。同时,我们的上下文感知定义生成在OOD样本中取得了显著性能提升,优于现有定义生成方法。我们将尽快发布源代码。