From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting

Selecting the ``right'' amount of information to include in a summary is a difficult task. A good summary should be detailed and entity-centric without being overly dense and hard to follow. To better understand this tradeoff, we solicit increasingly dense GPT-4 summaries with what we refer to as a ``Chain of Density'' (CoD) prompt. Specifically, GPT-4 generates an initial entity-sparse summary before iteratively incorporating missing salient entities without increasing the length. Summaries generated by CoD are more abstractive, exhibit more fusion, and have less of a lead bias than GPT-4 summaries generated by a vanilla prompt. We conduct a human preference study on 100 CNN DailyMail articles and find that that humans prefer GPT-4 summaries that are more dense than those generated by a vanilla prompt and almost as dense as human written summaries. Qualitative analysis supports the notion that there exists a tradeoff between informativeness and readability. 500 annotated CoD summaries, as well as an extra 5,000 unannotated summaries, are freely available on HuggingFace (https://huggingface.co/datasets/griffin/chain_of_density).

翻译：选择摘要中包含的“恰当”信息量是一项困难任务。优质摘要应兼具细节性与实体中心性，同时避免过于密集而难以理解。为深入探究这一权衡关系，我们通过一种称为“密度链”（Chain of Density，简称CoD）的提示方法，引导GPT-4生成逐渐密集的摘要。具体而言，GPT-4首先生成一个实体稀疏的初始摘要，随后在保持长度不变的前提下迭代纳入缺失的关键实体。与采用基础提示生成的GPT-4摘要相比，CoD摘要更具抽象性、融合度更高，且新闻导语偏倚更少。我们在100篇CNN DailyMail文章上开展人类偏好研究，发现相较于基础提示生成的摘要，人类更偏好密度更高的GPT-4摘要，其密度几乎与人工撰写的摘要相当。定性分析表明，信息量与可读性之间存在权衡关系。500份带有标注的CoD摘要及额外5,000份未标注摘要已通过HuggingFace平台（https://huggingface.co/datasets/griffin/chain_of_density）免费开放获取。

相关内容

GPT-4

关注 29

北京时间2023年3月15日凌晨，ChatGPT开发商OpenAI 发布了发布了全新的多模态预训练大模型 GPT-4，可以更可靠、更具创造力、能处理更细节的指令，根据图片和文字提示都能生成相应内容。具体来说来说，GPT-4 相比上一代的模型，实现了飞跃式提升：支持图像和文本输入，拥有强大的识图能力；大幅提升了文字输入限制，在ChatGPT模式下，GPT-4可以处理超过2.5万字的文本，可以处理一些更加细节的指令；回答准确性也得到了显著提高。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日