Despite the recent progress in language generation models, their outputs may not always meet user expectations. In this work, we study whether informational feedback in natural language can be leveraged to improve generation quality and user preference alignment. To this end, we consider factual consistency in summarization, the quality that the summary should only contain information supported by the input documents, as the user-expected preference. We collect a high-quality dataset, DeFacto, containing human demonstrations and informational natural language feedback consisting of corrective instructions, edited summaries, and explanations with respect to the factual consistency of the summary. Using our dataset, we study three natural language generation tasks: (1) editing a summary by following the human feedback, (2) generating human feedback for editing the original summary, and (3) revising the initial summary to correct factual errors by generating both the human feedback and edited summary. We show that DeFacto can provide factually consistent human-edited summaries and further insights into summarization factual consistency thanks to its informational natural language feedback. We further demonstrate that fine-tuned language models can leverage our dataset to improve the summary factual consistency, while large language models lack the zero-shot learning ability in our proposed tasks that require controllable text generation.
翻译:尽管语言生成模型近期取得了进展,但其输出并不总能满足用户期望。本文研究如何利用自然语言中的信息反馈来提升生成质量及用户偏好对齐。为此,我们以摘要的事实一致性——即摘要仅应包含输入文档所支持的信息这一质量要求——作为用户期望的偏好。我们构建了高质量数据集DeFacto,包含人工演示及信息性的自然语言反馈,该反馈涵盖针对摘要事实一致性的修正指令、编辑后的摘要及解释说明。基于该数据集,我们研究了三种自然语言生成任务:(1)遵循人类反馈编辑摘要;(2)生成用于编辑原始摘要的人类反馈;(3)通过同时生成人类反馈与编辑后摘要来修正初始摘要中的事实错误。实验表明,DeFacto能够提供事实一致的、经人工编辑的摘要,并因其信息性的自然语言反馈,为摘要事实一致性研究提供进一步洞见。我们还证明,微调后的语言模型可利用我们的数据集提升摘要事实一致性,而大型语言模型在我们提出的需要可控文本生成的任务中缺乏零样本学习能力。