Query-focused summarization (QFS) requires generating a summary given a query using a set of relevant documents. However, such relevant documents should be annotated manually and thus are not readily available in realistic scenarios. To address this limitation, we tackle the QFS task as a knowledge-intensive (KI) task without access to any relevant documents. Instead, we assume that these documents are present in a large-scale knowledge corpus and should be retrieved first. To explore this new setting, we build a new dataset (KI-QFS) by adapting existing QFS datasets. In this dataset, answering the query requires document retrieval from a knowledge corpus. We construct three different knowledge corpora, and we further provide relevance annotations to enable retrieval evaluation. Finally, we benchmark the dataset with state-of-the-art QFS models and retrieval-enhanced models. The experimental results demonstrate that QFS models perform significantly worse on KI-QFS compared to the original QFS task, indicating that the knowledge-intensive setting is much more challenging and offers substantial room for improvement. We believe that our investigation will inspire further research into addressing QFS in more realistic scenarios.
翻译:查询聚焦摘要(QFS)要求根据查询,利用一组相关文档生成摘要。然而,这类相关文档需要人工标注,因此在现实场景中不易获取。为解决这一局限,我们将QFS任务作为知识密集型(KI)任务来处理,不依赖任何相关文档。相反,我们假设这些文档存在于大规模知识语料库中,且应首先被检索。为探索这一新设定,我们通过调整现有QFS数据集构建了一个新数据集(KI-QFS)。在该数据集中,回答查询需要从知识语料库中检索文档。我们构建了三个不同的知识语料库,并进一步提供了相关性标注以实现检索评估。最后,我们使用最先进的QFS模型和检索增强模型对数据集进行了基准测试。实验结果表明,在KI-QFS上,QFS模型的表现显著逊于原始QFS任务,这表明知识密集型设定更具挑战性,且留有巨大的改进空间。我们相信,这项研究将激励在更现实场景中解决QFS问题的进一步研究。