Long document summarization remains a significant challenge for current large language models (LLMs), as existing approaches commonly struggle with information loss, factual inconsistencies, and coherence issues when processing excessively long documents. We propose SummQ, a novel adversarial multi-agent framework that addresses these limitations through collaborative intelligence between specialized agents operating in two complementary domains: summarization and quizzing. Our approach employs summary generators and reviewers that work collaboratively to create and evaluate comprehensive summaries, while quiz generators and reviewers create comprehension questions that serve as continuous quality checks for the summarization process. This adversarial dynamic, enhanced by an examinee agent that validates whether the generated summary contains the information needed to answer the quiz questions, enables iterative refinement through multifaceted feedback mechanisms. We evaluate SummQ on three widely used long document summarization benchmarks. Experimental results demonstrate that our framework significantly outperforms existing state-of-the-art methods across ROUGE and BERTScore metrics, as well as in LLM-as-a-Judge and human evaluations. Our comprehensive analyses reveal the effectiveness of the multi-agent collaboration dynamics, the influence of different agent configurations, and the impact of the quizzing mechanism. This work establishes a new approach for long document summarization that uses adversarial agentic collaboration to improve summarization quality.
翻译:长文档摘要仍然是当前大型语言模型(LLMs)面临的一项重大挑战,现有方法在处理过长文档时普遍存在信息丢失、事实不一致和连贯性问题。我们提出了SummQ,一种新颖的对抗性多智能体框架,通过在摘要和问答两个互补领域运作的专门智能体之间的协作智能来解决这些局限性。我们的方法采用摘要生成器和审阅器协同工作以创建和评估全面的摘要,同时问答生成器和审阅器创建理解性问题,作为摘要过程的持续质量检查。这种对抗性动态由一个考生智能体增强,该智能体验证生成的摘要是否包含回答问答问题所需的信息,从而通过多方面的反馈机制实现迭代优化。我们在三个广泛使用的长文档摘要基准上评估了SummQ。实验结果表明,我们的框架在ROUGE和BERTScore指标,以及在LLM-as-a-Judge和人工评估方面,均显著优于现有的最先进方法。我们的综合分析揭示了多智能体协作动态的有效性、不同智能体配置的影响以及问答机制的作用。这项工作为长文档摘要确立了一种新方法,即利用对抗性智能体协作来提高摘要质量。