Selecting the ``right'' amount of information to include in a summary is a difficult task. A good summary should be detailed and entity-centric without being overly dense and hard to follow. To better understand this tradeoff, we solicit increasingly dense GPT-4 summaries with what we refer to as a ``Chain of Density'' (CoD) prompt. Specifically, GPT-4 generates an initial entity-sparse summary before iteratively incorporating missing salient entities without increasing the length. Summaries generated by CoD are more abstractive, exhibit more fusion, and have less of a lead bias than GPT-4 summaries generated by a vanilla prompt. We conduct a human preference study on 100 CNN DailyMail articles and find that that humans prefer GPT-4 summaries that are more dense than those generated by a vanilla prompt and almost as dense as human written summaries. Qualitative analysis supports the notion that there exists a tradeoff between informativeness and readability. 500 annotated CoD summaries, as well as an extra 5,000 unannotated summaries, are freely available on HuggingFace (https://huggingface.co/datasets/griffin/chain_of_density).
翻译:选择摘要中包含的“恰当”信息量是一项困难任务。优质摘要应兼具细节性与实体中心性,同时避免过于密集而难以理解。为深入探究这一权衡关系,我们通过一种称为“密度链”(Chain of Density,简称CoD)的提示方法,引导GPT-4生成逐渐密集的摘要。具体而言,GPT-4首先生成一个实体稀疏的初始摘要,随后在保持长度不变的前提下迭代纳入缺失的关键实体。与采用基础提示生成的GPT-4摘要相比,CoD摘要更具抽象性、融合度更高,且新闻导语偏倚更少。我们在100篇CNN DailyMail文章上开展人类偏好研究,发现相较于基础提示生成的摘要,人类更偏好密度更高的GPT-4摘要,其密度几乎与人工撰写的摘要相当。定性分析表明,信息量与可读性之间存在权衡关系。500份带有标注的CoD摘要及额外5,000份未标注摘要已通过HuggingFace平台(https://huggingface.co/datasets/griffin/chain_of_density)免费开放获取。