Accurately attributing answer text to its source document is crucial for developing a reliable question-answering system. However, attribution for long documents remains largely unexplored. Post-hoc attribution systems are designed to map answer text back to the source document, yet the granularity of this mapping has not been addressed. Furthermore, a critical question arises: What exactly should be attributed? This involves identifying the specific information units within an answer that require grounding. In this paper, we propose and investigate a novel approach to the factual decomposition of generated answers for attribution, employing template-based in-context learning. To accomplish this, we utilize the question and integrate negative sampling during few-shot in-context learning for decomposition. This approach enhances the semantic understanding of both abstractive and extractive answers. We examine the impact of answer decomposition by providing a thorough examination of various attribution approaches, ranging from retrieval-based techniques to LLM-based attributors.
翻译:准确将答案文本归因至源文档对于构建可靠的问答系统至关重要。然而,针对长文档的归因研究仍处于探索不足的状态。后验归因系统旨在将答案文本映射回源文档,但该映射的粒度问题尚未得到充分探讨。此外,一个关键问题随之产生:究竟应对哪些内容进行归因?这涉及识别答案中需要被溯源的特定信息单元。本文提出并研究了一种基于模板上下文学习的新方法,用于对生成答案进行事实性分解以实现归因。为实现这一目标,我们利用问题信息,并在少样本上下文学习过程中引入负采样以辅助分解。该方法增强了对抽象型与抽取型答案的语义理解。我们通过系统考察多种归因方法(从基于检索的技术到基于大语言模型的归因器)来评估答案分解的实际影响。