Trustworthy Answers, Messier Data: Bridging the Gap in Low-Resource Retrieval-Augmented Generation for Domain Expert Systems

RAG has become a key technique for enhancing LLMs by reducing hallucinations, especially in domain expert systems where LLMs may lack sufficient inherent knowledge. However, developing these systems in low-resource settings introduces several challenges: (1) handling heterogeneous data sources, (2) optimizing retrieval phase for trustworthy answers, and (3) evaluating generated answers across diverse aspects. To address these, we introduce a data generation pipeline that transforms raw multi-modal data into structured corpus and Q&A pairs, an advanced re-ranking phase improving retrieval precision, and a reference matching algorithm enhancing answer traceability. Applied to the automotive engineering domain, our system improves factual correctness (+1.94), informativeness (+1.16), and helpfulness (+1.67) over a non-RAG baseline, based on a 1-5 scale by an LLM judge. These results highlight the effectiveness of our approach across distinct aspects, with strong answer grounding and transparency.

翻译：检索增强生成（RAG）已成为通过减少幻觉来增强大语言模型（LLM）的关键技术，尤其在LLM可能缺乏足够内在知识的领域专家系统中。然而，在低资源环境下开发此类系统面临若干挑战：（1）处理异构数据源，（2）为获得可信答案优化检索阶段，以及（3）从多个维度评估生成的答案。为此，我们引入了一个数据生成流水线，将原始多模态数据转化为结构化语料库和问答对；一个提升检索精度的进阶重排序阶段；以及一个增强答案可追溯性的参考匹配算法。在汽车工程领域的应用表明，基于LLM评判员1-5分的评分标准，我们的系统相较于非RAG基线在事实正确性（+1.94）、信息丰富度（+1.16）和实用性（+1.67）方面均有提升。这些结果凸显了我们的方法在不同维度上的有效性，并具备强大的答案依据性和透明度。

相关内容

Expert Systems

关注 324

专家系统（Expert Systems）发表的论文涉及知识工程的各个方面，包括知识获取和表达的各个方法和技术，以及它们在基于这些方法和技术的系统(包括专家系统)构建中的应用。详细的科学评价是任何论文的重要组成部分。除了传统的应用领域，如软件与需求工程、人机交互和人工智能，我们还瞄准了这些技术的新兴市场，如商业、经济、市场研究和医疗卫生保健。向这一新的重点的转变将以一系列特别问题为标志，这些问题包括热点和新出现的主题。官网地址：http://dblp.uni-trier.de/db/journals/es/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日