Large language models (LLMs) have demonstrated remarkable performance in abstractive summarization tasks. However, their ability to precisely control summary attributes (e.g., length or topic) remains underexplored, limiting their adaptability to specific user preferences. In this paper, we systematically explore the controllability of LLMs. To this end, we revisit summary attribute measurements and introduce iterative evaluation metrics, failure rate and average iteration count to precisely evaluate controllability of LLMs, rather than merely assessing errors. Our findings show that LLMs struggle more with numerical attributes than with linguistic attributes. To address this challenge, we propose a guide-to-explain framework (GTE) for controllable summarization. Our GTE framework enables the model to identify misaligned attributes in the initial draft and guides it in self-explaining errors in the previous output. By allowing the model to reflect on its misalignment, GTE generates well-adjusted summaries that satisfy the desired attributes with robust effectiveness, requiring surprisingly fewer iterations than other iterative approaches.
翻译:大型语言模型(LLMs)在抽象摘要任务中已展现出卓越的性能。然而,其在精确控制摘要属性(如长度或主题)方面的能力仍未得到充分探索,这限制了模型对特定用户偏好的适应性。本文系统性地探究了LLMs的可控性。为此,我们重新审视了摘要属性的度量方法,并引入了迭代评估指标——失败率与平均迭代次数,以精确评估LLMs的可控性,而非仅仅评估错误。我们的研究结果表明,LLMs在处理数值属性时比处理语言属性面临更大困难。为应对这一挑战,我们提出了一种用于可控摘要的引导-解释框架(GTE)。该框架使模型能够识别初始草稿中未对齐的属性,并引导其自我解释先前输出中的错误。通过让模型反思其未对齐之处,GTE能够生成经过良好调整的摘要,以稳健的效果满足期望属性,且所需迭代次数显著少于其他迭代方法。