Large language models (LLMs) are capable of generating coherent summaries from very long contexts given a user query, and extracting and citing evidence spans helps improve the trustworthiness of these summaries. Whereas previous work has focused on evidence citation with fixed levels of granularity (e.g. sentence, paragraph, document, etc.), we propose to extract unstructured (i.e., spans of any length) evidence in order to acquire more relevant and consistent evidence than in the fixed granularity case. We show how existing systems struggle to copy and properly cite unstructured evidence, which also tends to be "lost-in-the-middle". To help models perform this task, we create the Summaries with Unstructured Evidence Text dataset (SUnsET), a synthetic dataset generated using a novel pipeline, which can be used as training supervision for unstructured evidence summarization. We demonstrate across 5 LLMs and 4 datasets spanning human written, synthetic, single, and multi-document settings that LLMs adapted with SUnsET generate more relevant and factually consistent evidence with their summaries, extract evidence from more diverse locations in their context, and can generate more relevant and consistent summaries than baselines with no fine-tuning and fixed granularity evidence. We release SUnsET and our generation code to the public.
翻译:大型语言模型(LLMs)能够在给定用户查询的情况下,从非常长的上下文中生成连贯的摘要,而提取并引用证据片段有助于提升这些摘要的可信度。尽管先前的研究主要关注具有固定粒度级别(如句子、段落、文档等)的证据引用,但我们提出提取非结构化(即任意长度的片段)证据,以获取比固定粒度情况下更相关且一致的证据。我们展示了现有系统在复制和正确引用非结构化证据方面存在的困难,且此类证据往往存在“中间丢失”现象。为帮助模型执行此任务,我们创建了“非结构化证据文本摘要数据集”(SUnsET),这是一个通过新颖流程生成的合成数据集,可用作非结构化证据摘要的训练监督。我们在涵盖人工撰写、合成、单文档及多文档场景的4个数据集和5种LLMs上验证表明,经SUnsET适配的LLMs能生成与其摘要更相关且事实更一致的证据,从上下文中更广泛的位置提取证据,并能生成比未经微调的基线及固定粒度证据方法更相关且一致的摘要。我们已向公众发布SUnsET数据集及生成代码。