Open-Domain Multi-Document Summarization (ODMDS) is crucial for addressing diverse information needs, which aims to generate a summary as answer to user's query, synthesizing relevant content from multiple documents in a large collection. Existing approaches that first find relevant passages and then generate a summary using a language model are inadequate for ODMDS. This is because open-ended queries often require additional context for the retrieved passages to cover the topic comprehensively, making it challenging to retrieve all relevant passages initially. While iterative retrieval methods have been explored for multi-hop question answering (MQA), they are impractical for ODMDS due to high latency from repeated large language model (LLM) inference for reasoning. To address this issue, we propose LightPAL, a lightweight passage retrieval method for ODMDS that constructs a graph representing passage relationships using an LLM during indexing and employs random walk instead of iterative reasoning and retrieval at inference time. Experiments on ODMDS benchmarks show that LightPAL outperforms baseline retrievers in summary quality while being significantly more efficient than an iterative MQA approach.
翻译:开放域多文档摘要(ODMDS)对于满足多样化的信息需求至关重要,其目标是根据用户查询,从大规模文档集合中的多个相关文档中综合内容,生成摘要作为答案。现有方法通常先检索相关段落,再利用语言模型生成摘要,但这对于ODMDS而言并不充分。这是因为开放式查询往往需要为检索到的段落补充额外上下文,以全面覆盖主题,导致初始阶段难以一次性检索到所有相关段落。尽管在多跳问答(MQA)中已探索了迭代检索方法,但由于在推理过程中需要重复调用大语言模型(LLM)而产生的高延迟,这类方法在ODMDS中并不实用。为解决此问题,我们提出了LightPAL,一种面向ODMDS的轻量级段落检索方法。该方法在索引阶段使用LLM构建表示段落关系的图结构,并在推理时采用随机游走策略替代迭代式推理与检索。在ODMDS基准测试上的实验表明,LightPAL在摘要质量上优于基线检索模型,同时相比迭代式MQA方法显著提升了效率。