The Entity Set Expansion (ESE) task aims to expand a handful of seed entities with new entities belonging to the same semantic class. Conventional ESE methods are based on mono-modality (i.e., literal modality), which struggle to deal with complex entities in the real world such as: (1) Negative entities with fine-grained semantic differences. (2) Synonymous entities. (3) Polysemous entities. (4) Long-tailed entities. These challenges prompt us to propose Multi-modal Entity Set Expansion (MESE), where models integrate information from multiple modalities to represent entities. Intuitively, the benefits of multi-modal information for ESE are threefold: (1) Different modalities can provide complementary information. (2) Multi-modal information provides a unified signal via common visual properties for the same semantic class or entity. (3) Multi-modal information offers robust alignment signal for synonymous entities. To assess the performance of model in MESE and facilitate further research, we constructed the MESED dataset which is the first multi-modal dataset for ESE with large-scale and elaborate manual calibration. A powerful multi-modal model MultiExpan is proposed which is pre-trained on four multimodal pre-training tasks. The extensive experiments and analyses on MESED demonstrate the high quality of the dataset and the effectiveness of our MultiExpan, as well as pointing the direction for future research.
翻译:实体集扩展(ESE)任务旨在通过少量种子实体,扩展出属于相同语义类别的新实体。传统ESE方法基于单模态(即文本模态),难以应对现实世界中复杂实体,例如:(1)具有细粒度语义差异的负样本实体;(2)同义实体;(3)多义实体;(4)长尾实体。这些挑战促使我们提出多模态实体集扩展(MESE),其中模型整合来自多种模态的信息以表示实体。直观而言,多模态信息对ESE有三重优势:(1)不同模态可提供互补信息;(2)多模态信息通过共享的视觉属性为相同语义类别或实体提供统一信号;(3)多模态信息为同义实体提供鲁棒的语义对齐信号。为评估模型在MESE中的性能并推动后续研究,我们构建了MESED数据集——首个面向ESE的大规模、精细人工标校的多模态数据集。我们提出了强大的多模态模型MultiExpan,该模型在四项多模态预训练任务上进行预训练。在MESED上的大量实验与分析证明了数据集的高质量以及MultiExpan的有效性,并为未来研究指明了方向。