Multimodal entity linking (MEL) is the task that consists of matching textual and visual mentions of entities in unstructured data to their corresponding entities in a knowledge base (KB). To be effective in large-scale practical settings, MEL systems must meet three objectives: high linking accuracy, computational efficiency, and storage efficiency, i.e., a compact yet efficient index of the KB. In this paper, we highlight that state-of-the-art systems fail to simultaneously satisfy these 3 requirements. To meet this three-fold objective, we propose FAST-MEL, a lightweight encoder-based MEL solution that relies on a novel and compact fixed-size vectorized representation of both the textual and visual information of each entity or mention. It matches the accuracy of the best systems but performs three orders of magnitude faster. It also consumes one order of magnitude less storage than the fastest systems.
翻译:多模态实体链接(MEL)任务旨在将非结构化数据中实体的文本和视觉提及与知识库(KB)中的对应实体进行匹配。为了在大规模实际场景中有效运行,MEL系统必须满足三个目标:高链接准确率、计算高效性和存储高效性,即对知识库建立紧凑且高效的索引。本文指出,现有最先进系统无法同时满足这三项要求。为实现这一三重目标,我们提出FAST-MEL——一种基于轻量级编码器的MEL解决方案,该方案依赖一种新颖且紧凑的固定尺寸向量化表示,为每个实体或提及的文本和视觉信息进行编码。该方案在匹配最佳系统准确率的同时,运行速度快三个数量级,且比最快系统少消耗一个数量级的存储空间。