Query-focused summarization (QFS) aims to provide a summary of a single document/multi documents that can satisfy the information needs of a given query. It is useful for various real-world applications, such as abstractive snippet generation or more recent retrieval augmented generation (RAG). A prototypical QFS pipeline consists of a retriever (sparse or dense retrieval) and a generator (usually a large language model). However, applying large language models (LLM) potentially leads to hallucinations, especially when the evidence contradicts the prior belief of LLMs. There has been growing interest in developing new decoding methods to improve generation quality and reduce hallucination. In this work, we conduct a large-scale reproducibility on one recently proposed decoding method -- Context-aware Decoding (CAD). In addition to replicating CAD's experiments on news summarization datasets, we include experiments on QFS datasets, and conduct more rigorous analysis on computational complexity and hyperparameter sensitivity. Experiments with eight different language models show that performance-wise, CAD improves QFS quality by (1) reducing factuality errors/hallucinations while (2) mostly retaining the match of lexical patterns, measured by ROUGE scores, while also at a cost of increased inference-time FLOPs and reduced decoding speed. The code implementation based on Huggingface Library is made available https://github.com/zhichaoxu-shufe/context-aware-decoding-qfs
翻译:查询聚焦摘要(QFS)旨在为单个文档或多个文档提供能够满足给定查询信息需求的摘要。该技术对多种实际应用具有重要价值,例如抽象式片段生成或近期流行的检索增强生成(RAG)。典型的QFS流程包含检索器(稀疏或密集检索)和生成器(通常为大语言模型)两个组件。然而,使用大语言模型(LLM)可能导致幻觉现象,尤其是在证据与LLM的既有先验知识相矛盾时。近年来,学界对开发新型解码方法以提升生成质量并减少幻觉的关注度持续上升。本研究对近期提出的解码方法——上下文感知解码(CAD)进行了大规模可重复性验证。除复现CAD在新闻摘要数据集上的实验外,我们还纳入了QFS数据集的实验,并对计算复杂度和超参数敏感性进行了更严谨的分析。采用八种不同语言模型的实验表明:在性能方面,CAD通过减少事实性错误/幻觉来提升QFS质量,同时(在绝大多数情况下)保持ROUGE分数衡量的词汇模式匹配度,但代价是推理阶段FLOPs增加和解码速度降低。基于Huggingface库的代码实现已开源:https://github.com/zhichaoxu-shufe/context-aware-decoding-qfs