Concepts benefit natural language understanding but are far from complete in existing knowledge graphs (KGs). Recently, pre-trained language models (PLMs) have been widely used in text-based concept extraction (CE). However, PLMs tend to mine the co-occurrence associations from massive corpus as pre-trained knowledge rather than the real causal effect between tokens. As a result, the pre-trained knowledge confounds PLMs to extract biased concepts based on spurious co-occurrence correlations, inevitably resulting in low precision. In this paper, through the lens of a Structural Causal Model (SCM), we propose equipping the PLM-based extractor with a knowledge-guided prompt as an intervention to alleviate concept bias. The prompt adopts the topic of the given entity from the existing knowledge in KGs to mitigate the spurious co-occurrence correlations between entities and biased concepts. Our extensive experiments on representative multilingual KG datasets justify that our proposed prompt can effectively alleviate concept bias and improve the performance of PLM-based CE models.The code has been released on https://github.com/siyuyuan/KPCE.
翻译:概念有助于自然语言理解,但在现有知识图谱(KGs)中仍远未完善。近年来,预训练语言模型(PLMs)被广泛用于基于文本的概念抽取(CE)。然而,PLMs倾向于从大规模语料库中挖掘共现关联作为预训练知识,而非挖掘词元之间的真实因果效应。因此,这种预训练知识会混淆PLMs,使其基于虚假的共现相关性提取有偏概念,不可避免地导致低精度。本文通过结构因果模型(SCM)的视角,提出为基于PLM的抽取器配备知识引导提示作为干预手段,以缓解概念偏差。该提示利用知识图谱中现有知识的给定实体主题,来削弱实体与有偏概念之间的虚假共现相关性。我们在代表性多语言知识图谱数据集上的大量实验证明,我们提出的提示能够有效缓解概念偏差,并提升基于PLM的CE模型的性能。代码已发布在 https://github.com/siyuyuan/KPCE。