Controllable summarization allows users to generate customized summaries with specified attributes. However, due to the lack of designated annotations of controlled summaries, existing works have to craft pseudo datasets by adapting generic summarization benchmarks. Furthermore, most research focuses on controlling single attributes individually (e.g., a short summary or a highly abstractive summary) rather than controlling a mix of attributes together (e.g., a short and highly abstractive summary). In this paper, we propose MACSum, the first human-annotated summarization dataset for controlling mixed attributes. It contains source texts from two domains, news articles and dialogues, with human-annotated summaries controlled by five designed attributes (Length, Extractiveness, Specificity, Topic, and Speaker). We propose two simple and effective parameter-efficient approaches for the new task of mixed controllable summarization based on hard prompt tuning and soft prefix tuning. Results and analysis demonstrate that hard prompt models yield the best performance on all metrics and human evaluations. However, mixed-attribute control is still challenging for summarization tasks. Our dataset and code are available at https://github.com/psunlpgroup/MACSum.
翻译:可控摘要生成允许用户生成带有指定属性的定制化摘要。然而,由于缺乏受控摘要的专门标注,现有工作不得不通过适配通用摘要基准来构建伪数据集。此外,大多数研究聚焦于单独控制单个属性(例如,短摘要或高概括性摘要),而非同时控制混合属性(例如,兼具短篇幅和高概括性的摘要)。本文提出MACSum——首个面向混合属性控制的人工标注摘要数据集。该数据集包含来自新闻文章和对话两个领域的源文本,并配有由五项设计属性(长度、抽取性、具体性、主题和说话者)控制的人工标注摘要。我们针对混合可控摘要生成这一新任务,提出两种基于硬提示微调和软前缀微调的简单高效的参数高效方法。实验结果与分析表明,硬提示模型在所有指标和人工评估中均取得最优性能。然而,混合属性控制对摘要生成任务仍具挑战性。我们的数据集与代码发布于https://github.com/psunlpgroup/MACSum。