Concepts benefit natural language understanding but are far from complete in existing knowledge graphs (KGs). Recently, pre-trained language models (PLMs) have been widely used in text-based concept extraction (CE). However, PLMs tend to mine the co-occurrence associations from massive corpus as pre-trained knowledge rather than the real causal effect between tokens.As a result, the pre-trained knowledge confounds PLMs to extract biased concepts based on spurious co-occurrence correlations, inevitably resulting in low precision. In this paper, through the lens of a Structural Causal Model (SCM), we propose equipping the PLM-based extractor with a knowledge-guided prompt as an intervention to alleviate concept bias. The prompt adopts the topic of the given entity from the existing knowledge in KGs to mitigate the spurious co-occurrence correlations between entities and biased concepts. Our extensive experiments on representative multilingual KG datasets justify that our proposed prompt can effectively alleviate concept bias and improve the performance of PLM-based CE models.The code has been released on https://github.com/siyuyuan/KPCE.
翻译:概念有助于自然语言理解,但现有知识图谱(KGs)中的概念远未完善。近年来,预训练语言模型(PLMs)被广泛应用于基于文本的概念提取(CE)。然而,PLMs倾向于从大规模语料库中挖掘共现关联作为预训练知识,而非挖掘标记之间的真实因果效应。因此,预训练知识会混淆PLMs,使其基于虚假的共现相关性提取有偏差的概念,不可避免地导致低精确度。本文通过结构因果模型(SCM)的视角,提出为基于PLM的提取器配备知识引导提示作为干预措施,以缓解概念偏差。该提示采用知识图谱中现有知识给定的实体主题,以减轻实体与有偏差概念之间的虚假共现相关性。我们在具有代表性的多语言知识图谱数据集上进行了大量实验,证明我们提出的提示能有效缓解概念偏差,并提升基于PLM的概念提取模型的性能。代码已发布在https://github.com/siyuyuan/KPCE。