State-of-the-art performance in QA tasks is currently achieved by systems employing Large Language Models (LLMs), however these models tend to hallucinate information in their responses. One approach focuses on enhancing the generation process by incorporating attribution from the given input to the output. However, the challenge of identifying appropriate attributions and verifying their accuracy against a source is a complex task that requires significant improvements in assessing such systems. We introduce an attribution-oriented Chain-of-Thought reasoning method to enhance the accuracy of attributions. This approach focuses the reasoning process on generating an attribution-centric output. Evaluations on two context-enhanced question-answering datasets using GPT-4 demonstrate improved accuracy and correctness of attributions. In addition, the combination of our method with finetuning enhances the response and attribution accuracy of two smaller LLMs, showing their potential to outperform GPT-4 in some cases.
翻译:当前问答任务的最优性能由采用大型语言模型(LLM)的系统实现,但这些模型在生成回复时易产生信息幻觉。现有研究通过增强从输入到输出的归因机制来改进生成过程,然而识别合理归因并验证其与来源的一致性仍是一项挑战,需要显著提升对此类系统的评估能力。本文提出一种面向归因的链式思考推理方法,通过将推理过程聚焦于生成归因中心化输出来提升归因准确率。在基于GPT-4的两个上下文增强问答数据集上的评估表明,该方法显著提升了归因的准确性与正确性。此外,将该方法与微调技术结合可增强两种较小LLM的响应质量与归因精度,在某些场景下甚至展现出超越GPT-4的潜力。