Multi-document summarization is a challenging task due to its inherent subjective bias, highlighted by the low inter-annotator ROUGE-1 score of 0.4 among DUC-2004 reference summaries. In this work, we aim to enhance the objectivity of news summarization by focusing on the main event of a group of related news documents and presenting it coherently with sufficient context. Our primary objective is to succinctly report the main event, ensuring that the summary remains objective and informative. To achieve this, we employ an extract-rewrite approach that incorporates a main-event biased monotone-submodular function for content selection. This enables us to extract the most crucial information related to the main event from the document cluster. To ensure coherence, we utilize a fine-tuned Language Model (LLM) for rewriting the extracted content into a coherent text. The evaluation using objective metrics and human evaluators confirms the effectiveness of our approach, as it surpasses potential baselines, demonstrating excellence in both content coverage, coherence, and informativeness.
翻译:多文档摘要因其固有主观偏差而具有挑战性,这在DUC-2004参考摘要中较低的0.4标注者间ROUGE-1得分中尤为突出。本研究旨在通过聚焦相关新闻文档组的主事件,并以充分上下文连贯呈现该事件,从而增强新闻摘要的客观性。我们的主要目标是简洁报道主事件,确保摘要既客观又信息丰富。为实现这一目标,我们采用了一种抽取-重写方法,该方法在主事件偏向的单调子模函数指导下进行内容选择,从而能够从文档簇中抽取与主事件最相关的关键信息。为确保连贯性,我们利用微调后的大语言模型将抽取内容重写为连贯文本。通过客观指标和人工评估,我们方法的有效性得到验证——它不仅超越了多个基准方法,更在内容覆盖度、连贯性与信息丰富性方面均展现出卓越性能。