Leveraging the Domain Adaptation of Retrieval Augmented Generation Models for Question Answering and Reducing Hallucination

While ongoing advancements in Large Language Models have demonstrated remarkable success across various NLP tasks, Retrieval Augmented Generation Model stands out to be highly effective on downstream applications like Question Answering. Recently, RAG-end2end model further optimized the architecture and achieved notable performance improvements on domain adaptation. However, the effectiveness of these RAG-based architectures remains relatively unexplored when fine-tuned on specialized domains such as customer service for building a reliable conversational AI system. Furthermore, a critical challenge persists in reducing the occurrence of hallucinations while maintaining high domain-specific accuracy. In this paper, we investigated the performance of diverse RAG and RAG-like architectures through domain adaptation and evaluated their ability to generate accurate and relevant response grounded in the contextual knowledge base. To facilitate the evaluation of the models, we constructed a novel dataset HotelConvQA, sourced from wide range of hotel-related conversations and fine-tuned all the models on our domain specific dataset. We also addressed a critical research gap on determining the impact of domain adaptation on reducing hallucinations across different RAG architectures, an aspect that was not properly measured in prior work. Our evaluation shows positive results in all metrics by employing domain adaptation, demonstrating strong performance on QA tasks and providing insights into their efficacy in reducing hallucinations. Our findings clearly indicate that domain adaptation not only enhances the models' performance on QA tasks but also significantly reduces hallucination across all evaluated RAG architectures.

翻译：尽管大型语言模型的持续进展已在各种自然语言处理任务中展现出显著成功，但检索增强生成模型在问答等下游应用中表现出卓越效能。近期，RAG-end2end模型进一步优化了架构，在领域自适应方面取得了显著性能提升。然而，当针对客户服务等专业领域进行微调以构建可靠对话AI系统时，这些基于RAG的架构有效性仍未得到充分探索。此外，在保持高领域特定准确性的同时减少幻觉发生这一关键挑战依然存在。本文通过领域自适应研究了多种RAG及类RAG架构的性能，并评估了其基于上下文知识库生成准确相关响应的能力。为促进模型评估，我们构建了新颖的HotelConvQA数据集，该数据集源自广泛的酒店相关对话，并将所有模型在我们的领域特定数据集上进行了微调。我们还解决了衡量领域自适应对不同RAG架构减少幻觉影响的关键研究空白——这一方面在先前工作中未得到充分评估。实验结果表明，采用领域自适应后所有指标均呈现积极结果，在问答任务中展现出强劲性能，并揭示了其在减少幻觉方面的有效性。我们的研究明确表明，领域自适应不仅能提升模型在问答任务中的性能，还能显著降低所有评估RAG架构中的幻觉现象。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日