Entity summarization aims to compute concise summaries for entities in knowledge graphs. Existing datasets and benchmarks are often limited to a few hundred entities and discard graph structure in source knowledge graphs. This limitation is particularly pronounced when it comes to ground-truth summaries, where there exist only a few labeled summaries for evaluation and training. We propose WikES, a comprehensive benchmark comprising of entities, their summaries, and their connections. Additionally, WikES features a dataset generator to test entity summarization algorithms in different areas of the knowledge graph. Importantly, our approach combines graph algorithms and NLP models as well as different data sources such that WikES does not require human annotation, rendering the approach cost-effective and generalizable to multiple domains. Finally, WikES is scalable and capable of capturing the complexities of knowledge graphs in terms of topology and semantics. WikES features existing datasets for comparison. Empirical studies of entity summarization methods confirm the usefulness of our benchmark. Data, code, and models are available at: https://github.com/msorkhpar/wiki-entity-summarization.
翻译:实体摘要旨在为知识图谱中的实体计算简洁的摘要。现有的数据集和基准通常仅限于几百个实体,并且舍弃了源知识图谱中的图结构。这一限制在涉及真实摘要时尤为明显,因为可用于评估和训练的标注摘要数量极少。我们提出了WikES,这是一个包含实体、其摘要及其关联的综合基准。此外,WikES还配备了一个数据集生成器,用于在知识图谱的不同区域测试实体摘要算法。重要的是,我们的方法结合了图算法和NLP模型以及不同的数据源,使得WikES无需人工标注,从而使该方法具有成本效益并可推广到多个领域。最后,WikES具有可扩展性,能够捕捉知识图谱在拓扑和语义方面的复杂性。WikES还包含了用于比较的现有数据集。对实体摘要方法的实证研究证实了我们基准的有效性。数据、代码和模型可在以下网址获取:https://github.com/msorkhpar/wiki-entity-summarization。