It is crucial to automatically construct knowledge graphs (KGs) of diverse new relations to support knowledge discovery and broad applications. Previous KG construction methods, based on either crowdsourcing or text mining, are often limited to a small predefined set of relations due to manual cost or restrictions in text corpus. Recent research proposed to use pretrained language models (LMs) as implicit knowledge bases that accept knowledge queries with prompts. Yet, the implicit knowledge lacks many desirable properties of a full-scale symbolic KG, such as easy access, navigation, editing, and quality assurance. In this paper, we propose a new approach of harvesting massive KGs of arbitrary relations from pretrained LMs. With minimal input of a relation definition (a prompt and a few shot of example entity pairs), the approach efficiently searches in the vast entity pair space to extract diverse accurate knowledge of the desired relation. We develop an effective search-and-rescore mechanism for improved efficiency and accuracy. We deploy the approach to harvest KGs of over 400 new relations from different LMs. Extensive human and automatic evaluations show our approach manages to extract diverse accurate knowledge, including tuples of complex relations (e.g., "A is capable of but not good at B"). The resulting KGs as a symbolic interpretation of the source LMs also reveal new insights into the LMs' knowledge capacities.
翻译:摘要:自动构建包含多种新关系的知识图谱对于知识发现和广泛应用至关重要。以往基于众包或文本挖掘的知识图谱构建方法,常受限于人工成本或文本语料库的约束,仅能处理少量预定义关系。近期研究提出将预训练语言模型作为隐性知识库,通过提示形式接受知识查询。然而,隐性知识缺乏完整符号化知识图谱的诸多理想特性(如便捷访问、导航、编辑与质量保证)。本文提出一种从预训练语言模型中抽取大规模任意关系知识图谱的新方法。该方法仅需最小输入——关系定义(一个提示及少量示例实体对),即可高效搜索广阔实体对空间,提取所需关系的多样且准确知识。我们开发了一种高效的搜索-重打分机制以提升效率与准确性,并将该方法部署至不同语言模型,抽取了超过400种新关系的知识图谱。广泛的人工与自动评估表明,该方法能有效提取包含复杂关系(如"A有能力但并非擅长B")的多样准确知识。由此生成的符号化知识图谱作为源语言模型的解释,也揭示了语言模型知识能力的新见解。