Companies regularly have to contend with multi-release systems, where several versions of the same software are in operation simultaneously. Question answering over documents from multi-release systems poses challenges because different releases have distinct yet overlapping documentation. Motivated by the observed inaccuracy of state-of-the-art question-answering techniques on multi-release system documents, we propose QAMR, a chatbot designed to answer questions across multi-release system documentation. QAMR enhances traditional retrieval-augmented generation (RAG) to ensure accuracy in the face of highly similar yet distinct documentation for different releases. It achieves this through a novel combination of pre-processing, query rewriting, and context selection. In addition, QAMR employs a dual-chunking strategy to enable separately tuned chunk sizes for retrieval and answer generation, improving overall question-answering accuracy. We evaluate QAMR using a public software-engineering benchmark as well as a collection of real-world, multi-release system documents from our industry partner, Ciena. Our evaluation yields five main findings: (1) QAMR outperforms a baseline RAG-based chatbot, achieving an average answer correctness of 88.5% and an average retrieval accuracy of 90%, which correspond to improvements of 16.5% and 12%, respectively. (2) An ablation study shows that QAMR's mechanisms for handling multi-release documents directly improve answer accuracy. (3) Compared to its component-ablated variants, QAMR achieves a 19.6% average gain in answer correctness and a 14.0% average gain in retrieval accuracy over the best ablation. (4) QAMR reduces response time by 8% on average relative to the baseline. (5) The automatically computed accuracy metrics used in our evaluation strongly correlate with expert human assessments, validating the reliability of our methodology.


翻译:企业经常需要应对多版本系统,即同一软件的多个版本同时运行。针对多版本系统文档的问答面临特殊挑战,因为不同版本的文档既存在差异又相互重叠。基于对现有先进问答技术在多版本系统文档上表现欠佳的观察,我们提出QAMR——一个专为跨多版本系统文档回答问题而设计的对话系统。QAMR通过增强传统检索增强生成(RAG)技术,确保在面对高度相似但存在差异的多版本文档时仍能保持准确性。其创新性体现在预处理、查询重写和上下文选择机制的新型组合。此外,QAMR采用双重分块策略,为检索和答案生成分别优化分块大小,从而提升整体问答准确率。我们使用公开的软件工程基准测试集以及来自工业合作伙伴Ciena的真实多版本系统文档集对QAMR进行评估。评估得出五项主要结论:(1)QAMR在基于RAG的基线对话系统基础上实现显著提升,平均答案正确率达88.5%,平均检索准确率达90%,分别对应16.5%和12%的改进;(2)消融实验表明QAMR处理多版本文档的机制直接提升了答案准确性;(3)与组件消融变体相比,QAMR在最佳消融版本基础上实现了19.6%的平均答案正确率提升和14.0%的平均检索准确率提升;(4)QAMR较基线系统平均降低8%的响应时间;(5)评估中采用的自动计算准确度指标与专家人工评估结果高度相关,验证了方法论的可靠性。

0
下载
关闭预览

相关内容

【CVPR2024】MoReVQA:探索视频问答的模块化推理模型
专知会员服务
18+阅读 · 2024年4月10日
预知未来——Gluon 时间序列工具包(GluonTS)
ApacheMXNet
24+阅读 · 2019年6月25日
小样本学习(Few-shot Learning)综述
云栖社区
22+阅读 · 2019年4月6日
国家自然科学基金
2+阅读 · 2015年12月31日
国家自然科学基金
46+阅读 · 2015年12月31日
国家自然科学基金
5+阅读 · 2015年12月31日
VIP会员
相关基金
国家自然科学基金
2+阅读 · 2015年12月31日
国家自然科学基金
46+阅读 · 2015年12月31日
国家自然科学基金
5+阅读 · 2015年12月31日
Top
微信扫码咨询专知VIP会员