Multimodal Named Entity Recognition (MNER) is a crucial task for information extraction from social media platforms such as Twitter. Most current methods rely on attention weights to extract information from both text and images but are often unreliable and lack interpretability. To address this problem, we propose incorporating uncertainty estimation into the MNER task, producing trustworthy predictions. Our proposed algorithm models the distribution of each modality as a Normal-inverse Gamma distribution, and fuses them into a unified distribution with an evidential fusion mechanism, enabling hierarchical characterization of uncertainties and promotion of prediction accuracy and trustworthiness. Additionally, we explore the potential of pre-trained large foundation models in MNER and propose an efficient fusion approach that leverages their robust feature representations. Experiments on two datasets demonstrate that our proposed method outperforms the baselines and achieves new state-of-the-art performance.
翻译:多模态命名实体识别(Multimodal Named Entity Recognition, MNER)是从Twitter等社交媒体平台进行信息抽取的关键任务。当前多数方法依赖注意力权重从文本和图像中提取信息,但往往不可靠且缺乏可解释性。为解决这一问题,我们提出将不确定性估计引入MNER任务,以生成可信赖的预测。所提算法将各模态的分布建模为正态-逆伽马分布,并通过证据融合机制将其融合为统一分布,从而实现不确定性的分层表征,提升预测准确性与可信度。此外,我们探索了预训练大型基础模型在MNER中的潜力,并提出一种高效融合方法,利用其稳健的特征表示。在两个数据集上的实验表明,所提方法优于基线模型,并取得了新的最优性能。