State-of-the-art performance in QA tasks is currently achieved by systems employing Large Language Models (LLMs), however these models tend to hallucinate information in their responses. One approach focuses on enhancing the generation process by incorporating attribution from the given input to the output. However, the challenge of identifying appropriate attributions and verifying their accuracy against a source is a complex task that requires significant improvements in assessing such systems. We introduce an attribution-oriented Chain-of-Thought reasoning method to enhance the accuracy of attributions. This approach focuses the reasoning process on generating an attribution-centric output. Evaluations on two context-enhanced question-answering datasets using GPT-4 demonstrate improved accuracy and correctness of attributions. In addition, the combination of our method with finetuning enhances the response and attribution accuracy of two smaller LLMs, showing their potential to outperform GPT-4 in some cases.
翻译:当前,问答任务的最先进性能由采用大型语言模型的系统实现,然而这些模型倾向于在响应中产生幻觉信息。一种方法侧重于通过整合从给定输入到输出的归因来增强生成过程。然而,识别适当的归因并对照来源验证其准确性是一项复杂的任务,需要在评估此类系统方面取得显著改进。我们引入了一种面向归因的思维链推理方法,以提高归因的准确性。该方法将推理过程聚焦于生成以归因为中心的输出。在两个上下文增强的问答数据集上使用GPT-4进行的评估表明,归因的准确性和正确性得到了提升。此外,我们的方法与微调相结合,提高了两个较小大型语言模型的响应和归因准确性,显示出它们在部分情况下有潜力超越GPT-4。