General-purpose knowledge bases (KBs) are a cornerstone of knowledge-centric AI. Many of them are constructed pragmatically from Web sources, and are thus far from complete. This poses challenges for the consumption as well as the curation of their content. While several surveys target the problem of completing incomplete KBs, the first problem is arguably to know whether and where the KB is incomplete in the first place, and to which degree. In this survey we discuss how knowledge about completeness, recall, and negation in KBs can be expressed, extracted, and inferred. We cover (i) the logical foundations of knowledge representation and querying under partial closed-world semantics; (ii) the estimation of this information via statistical patterns; (iii) the extraction of information about recall from KBs and text; (iv) the identification of interesting negative statements; and (v) relaxed notions of relative recall. This survey is targeted at two types of audiences: (1) practitioners who are interested in tracking KB quality, focusing extraction efforts, and building quality-aware downstream applications; and (2) data management, knowledge base and semantic web researchers who wish to understand the state of the art of knowledge bases beyond the open-world assumption. Consequently, our survey presents both fundamental methodologies and their working, and gives practice-oriented recommendations on how to choose between different approaches for a problem at hand.
翻译:通用知识库(KBs)是知识驱动型AI的基石。许多知识库基于网络来源以实用主义方式构建,因此远未达到完备状态。这对知识库内容的消费与维护提出了挑战。尽管多篇综述聚焦于补全不完整知识库的问题,但首要问题在于了解知识库是否及在何处存在不完备性及其程度。本综述探讨了知识库中关于完备性、召回率与否定信息的表达、抽取与推理方法。我们涵盖:(i)部分封闭世界语义下知识表示与查询的逻辑基础;(ii)通过统计模式估计此类信息的方法;(iii)从知识库与文本中提取召回率信息的技术;(iv)有趣否定断言(negative statements)的识别方法;(v)相对召回率的松弛概念。本综述面向两类读者:(1)关注跟踪知识库质量、聚焦抽取工作并构建质量感知下游应用的实践者;(2)希望理解超越开放世界假设的知识库前沿技术的数据管理、知识库与语义网络研究者。因此,综述既呈现基础方法论及其运作原理,也为解决特定问题时的方案选择提供实践导向的建议。