Query-focused summarization (QFS) aims to provide a summary of a single document/multi documents that can satisfy the information needs of a given query. It is useful for various real-world applications, such as abstractive snippet generation or more recent retrieval augmented generation (RAG). A prototypical QFS pipeline consists of a retriever (sparse or dense retrieval) and a generator (usually a large language model). However, applying large language models (LLM) potentially leads to hallucinations, especially when the evidence contradicts the prior belief of LLMs. There has been growing interest in developing new decoding methods to improve generation quality and reduce hallucination. In this work, we conduct a large-scale reproducibility study on one recently proposed decoding method\, -- \,Context-aware Decoding (CAD). In addition to replicating CAD's experiments on news summarization datasets, we include experiments on QFS datasets, and conduct more rigorous analysis on computational complexity and hyperparameter sensitivity. Experiments with eight different language models show that performance-wise, CAD improves QFS quality by (1) reducing factuality errors/hallucinations while (2) mostly retaining the match of lexical patterns, measured by ROUGE scores, while also at a cost of increased inference-time FLOPs and reduced decoding speed. The \href{https://github.com/zhichaoxu-shufe/context-aware-decoding-qfs}{code implementation} based on Huggingface Library is made available
翻译:查询聚焦摘要(QFS)旨在为单个文档/多个文档提供满足给定查询信息需求的摘要。这对于各种实际应用非常有用,例如抽象性片段生成或较新的检索增强生成(RAG)。典型的QFS流程包含检索器(稀疏或密集检索)和生成器(通常为大语言模型)。然而,应用大语言模型(LLM)可能导致幻觉,尤其在证据与LLM先验信念相矛盾时。开发新的解码方法以提高生成质量并减少幻觉的研究日益受到关注。本研究对近期提出的上下文感知解码(CAD)方法进行了大规模可复现性验证。除了复现CAD在新闻摘要数据集上的实验外,我们增加了QFS数据集的实验,并对计算复杂度和超参数敏感性进行了更严格的分析。使用八种不同语言模型的实验表明:性能方面,CAD通过(1)减少事实性错误/幻觉,同时(2)基本保持ROUGE分数衡量的词汇模式匹配度,从而提升QFS质量,但代价是增加了推理时的浮点运算量并降低了解码速度。基于Huggingface库的\href{https://github.com/zhichaoxu-shufe/context-aware-decoding-qfs}{代码实现}已公开提供。