Semi-supervised entity alignment (EA) is a practical and challenging task because of the lack of adequate labeled mappings as training data. Most works address this problem by generating pseudo mappings for unlabeled entities. However, they either suffer from the erroneous (noisy) pseudo mappings or largely ignore the uncertainty of pseudo mappings. In this paper, we propose a novel semi-supervised EA method, termed as MixTEA, which guides the model learning with an end-to-end mixture teaching of manually labeled mappings and probabilistic pseudo mappings. We firstly train a student model using few labeled mappings as standard. More importantly, in pseudo mapping learning, we propose a bi-directional voting (BDV) strategy that fuses the alignment decisions in different directions to estimate the uncertainty via the joint matching confidence score. Meanwhile, we also design a matching diversity-based rectification (MDR) module to adjust the pseudo mapping learning, thus reducing the negative influence of noisy mappings. Extensive results on benchmark datasets as well as further analyses demonstrate the superiority and the effectiveness of our proposed method.
翻译:摘要:半监督实体对齐(EA)是一项实用且具有挑战性的任务,因为缺乏足够的标注映射作为训练数据。现有研究多通过为未标注实体生成伪映射来解决该问题,但此类方法或受限于伪映射中的错误噪声,或普遍忽视伪映射的不确定性。本文提出一种新型半监督实体对齐方法——MixTEA,该方法通过端到端的混合教学机制,融合人工标注映射与概率伪映射来指导模型学习。首先,我们使用少量标注映射作为基准训练学生模型。更重要的是,在伪映射学习过程中,我们提出双向投票(BDV)策略,通过融合不同方向的匹配决策,利用联合匹配置信度分数评估对应关系的不确定性。同时,我们设计基于匹配多样性的纠正(MDR)模块调整伪映射学习过程,从而降低噪声映射的负面影响。在基准数据集上的大量实验及进一步分析证明了所提出方法的优越性与有效性。