Recent advances in named entity recognition (NER) have pushed the boundary of the task to incorporate visual signals, leading to many variants, including multi-modal NER (MNER) or grounded MNER (GMNER). A key challenge to these tasks is that the model should be able to generalize to the entities unseen during the training, and should be able to handle the training samples with noisy annotations. To address this obstacle, we propose SCANNER (Span CANdidate detection and recognition for NER), a model capable of effectively handling all three NER variants. SCANNER is a two-stage structure; we extract entity candidates in the first stage and use it as a query to get knowledge, effectively pulling knowledge from various sources. We can boost our performance by utilizing this entity-centric extracted knowledge to address unseen entities. Furthermore, to tackle the challenges arising from noisy annotations in NER datasets, we introduce a novel self-distillation method, enhancing the robustness and accuracy of our model in processing training data with inherent uncertainties. Our approach demonstrates competitive performance on the NER benchmark and surpasses existing methods on both MNER and GMNER benchmarks. Further analysis shows that the proposed distillation and knowledge utilization methods improve the performance of our model on various benchmarks.
翻译:近期命名实体识别(NER)的进展将任务边界扩展至视觉信号,催生出多模态NER(MNER)和接地MNER(GMNER)等变体。这些任务的核心挑战在于模型需具备对训练时未见实体的泛化能力,并能处理含噪声标注的训练样本。为解决该难题,我们提出SCANNER(Span候选检测与识别方法),该模型能有效处理全部三种NER变体。SCANNER采用两阶段架构:首阶段提取实体候选片段,并将其作为查询获取外部知识,通过整合多源知识提升模型性能。利用这种以实体为中心的知识提取策略,我们可有效应对未见实体。此外,针对NER数据集中的噪声标注问题,我们提出新型自蒸馏方法,增强模型在处理含固有不确定性的训练数据时的鲁棒性与准确性。本方法在NER基准测试中展现竞争性表现,并在MNER和GMNER基准上超越现有方法。进一步分析表明,所提出的蒸馏与知识利用方法可有效提升模型在多种基准上的性能。