General-purpose knowledge bases (KBs) are a cornerstone of knowledge-centric AI. Many of them are constructed pragmatically from Web sources, and are thus far from complete. This poses challenges for the consumption as well as the curation of their content. While several surveys target the problem of completing incomplete KBs, the first problem is arguably to know whether and where the KB is incomplete in the first place, and to which degree. In this survey we discuss how knowledge about completeness, recall, and negation in KBs can be expressed, extracted, and inferred. We cover (i) the logical foundations of knowledge representation and querying under partial closed-world semantics; (ii) the estimation of this information via statistical patterns; (iii) the extraction of information about recall from KBs and text; (iv) the identification of interesting negative statements; and (v) relaxed notions of relative recall. This survey is targeted at two types of audiences: (1) practitioners who are interested in tracking KB quality, focusing extraction efforts, and building quality-aware downstream applications; and (2) data management, knowledge base and semantic web researchers who wish to understand the state of the art of knowledge bases beyond the open-world assumption. Consequently, our survey presents both fundamental methodologies and their working, and gives practice-oriented recommendations on how to choose between different approaches for a problem at hand.
翻译:通用知识库(KB)是知识驱动型人工智能的基石。许多知识库通过从网络资源中实用性构建而来,因此远非完备。这给知识库内容的消费与维护带来了挑战。尽管已有若干综述专注于不完备知识库的补全问题,但首要问题或许是判断知识库是否及何处存在不完备性及其程度。本综述探讨了如何在知识库中表达、提取和推断有关完备性、召回率与否定信息。我们涵盖:(i)部分封闭世界语义下的知识表示与查询的逻辑基础;(ii)通过统计模式估计此类信息的方法;(iii)从知识库和文本中提取召回率信息的技术;(iv)有意义的否定陈述的识别;(v)相对召回率的宽松概念。本综述面向两类读者:(1)关注知识库质量追踪、聚焦提取工作并构建质量感知下游应用的从业者;(2)希望理解超越开放世界假设的知识库前沿研究的数据管理、知识库及语义网络研究人员。因此,本综述既阐述了基础方法论及其工作原理,也给出了针对具体问题如何选择不同方法的实践导向建议。