Similarities between entities occur frequently in many real-world scenarios. For over a century, researchers in different fields have proposed a range of approaches to measure the similarity between entities. More recently, inspired by "Google Sets", significant academic and commercial efforts have been devoted to expanding a given set of entities with similar ones. As a result, existing approaches nowadays are able to take into account properties shared by entities, hereinafter called nexus of similarity. Accordingly, machines are largely able to deal with both similarity measures and set expansions. To the best of our knowledge, however, there is no way to characterize nexus of similarity between entities, namely identifying such nexus in a formal and comprehensive way so that they are both machine- and human-readable; moreover, there is a lack of consensus on evaluating existing approaches for weakly similar entities. As a first step towards filling these gaps, we aim to complement existing literature by developing a novel logic-based framework to formally and automatically characterize nexus of similarity between tuples of entities within a knowledge base. Furthermore, we analyze computational complexity aspects of this framework.
翻译:实体间的相似性在现实场景中频繁出现。一个多世纪以来,不同领域的研究人员提出了多种衡量实体间相似性的方法。近期,受"Google Sets"启发,学术界和工业界投入大量精力致力于扩展给定实体集以包含相似实体。因此,现有方法已能考虑实体共享的属性(以下称为相似性纽带)。尽管机器已能处理相似度度量和集合扩展任务,但据我们所知,目前仍缺乏对实体间相似性纽带的刻画方法——即通过形式化且全面的方式识别这类纽带,使其兼具机器可读性与人类可理解性;此外,针对弱相似实体的现有评估方法尚未形成共识。作为填补这些空白的初步探索,本文旨在通过开发创新的基于逻辑的框架,形式化且自动化地刻画知识库中实体元组间的相似性纽带,从而补充现有文献。同时,我们分析了该框架的计算复杂性特征。