No existing dataset adequately tests how well language models can incrementally update entity summaries - a crucial ability as these models rapidly advance. The Incremental Entity Summarization (IES) task is vital for maintaining accurate, up-to-date knowledge. To address this, we introduce SUMIE, a fully synthetic dataset designed to expose real-world IES challenges. This dataset effectively highlights problems like incorrect entity association and incomplete information presentation. Unlike common synthetic datasets, ours captures the complexity and nuances found in real-world data. We generate informative and diverse attributes, summaries, and unstructured paragraphs in sequence, ensuring high quality. The alignment between generated summaries and paragraphs exceeds 96%, confirming the dataset's quality. Extensive experiments demonstrate the dataset's difficulty - state-of-the-art LLMs struggle to update summaries with an F1 higher than 80.4%. We will open source the benchmark and the evaluation metrics to help the community make progress on IES tasks.
翻译:现有数据集均无法充分测试语言模型在增量更新实体摘要方面的能力——随着这些模型的快速发展,这种能力至关重要。增量实体摘要(IES)任务对于维护准确、最新的知识至关重要。为此,我们提出了SUMIE,一个完全合成的数据集,旨在揭示现实世界中的IES挑战。该数据集有效地突显了诸如实体关联错误和信息呈现不完整等问题。与常见的合成数据集不同,我们的数据集捕捉了现实世界数据中的复杂性和细微差别。我们按顺序生成信息丰富且多样化的属性、摘要和非结构化段落,确保了高质量。生成的摘要与段落之间的一致性超过96%,证实了数据集的质量。大量实验证明了该数据集的难度——最先进的大语言模型在更新摘要时,F1分数难以超过80.4%。我们将开源该基准和评估指标,以帮助社区在IES任务上取得进展。