多版本系统中的问答技术：Ciena案例研究 (Question Answering for Multi-Release Systems: A Case Study at Ciena)

Companies regularly have to contend with multi-release systems, where several versions of the same software are in operation simultaneously. Question answering over documents from multi-release systems poses challenges because different releases have distinct yet overlapping documentation. Motivated by the observed inaccuracy of state-of-the-art question-answering techniques on multi-release system documents, we propose QAMR, a chatbot designed to answer questions across multi-release system documentation. QAMR enhances traditional retrieval-augmented generation (RAG) to ensure accuracy in the face of highly similar yet distinct documentation for different releases. It achieves this through a novel combination of pre-processing, query rewriting, and context selection. In addition, QAMR employs a dual-chunking strategy to enable separately tuned chunk sizes for retrieval and answer generation, improving overall question-answering accuracy. We evaluate QAMR using a public software-engineering benchmark as well as a collection of real-world, multi-release system documents from our industry partner, Ciena. Our evaluation yields five main findings: (1) QAMR outperforms a baseline RAG-based chatbot, achieving an average answer correctness of 88.5% and an average retrieval accuracy of 90%, which correspond to improvements of 16.5% and 12%, respectively. (2) An ablation study shows that QAMR's mechanisms for handling multi-release documents directly improve answer accuracy. (3) Compared to its component-ablated variants, QAMR achieves a 19.6% average gain in answer correctness and a 14.0% average gain in retrieval accuracy over the best ablation. (4) QAMR reduces response time by 8% on average relative to the baseline. (5) The automatically computed accuracy metrics used in our evaluation strongly correlate with expert human assessments, validating the reliability of our methodology.

翻译：企业经常需要应对多版本系统，即同一软件的多个版本同时运行。针对多版本系统文档的问答面临特殊挑战，因为不同版本的文档既存在差异又相互重叠。基于对现有先进问答技术在多版本系统文档上表现欠佳的观察，我们提出QAMR——一个专为跨多版本系统文档回答问题而设计的对话系统。QAMR通过增强传统检索增强生成（RAG）技术，确保在面对高度相似但存在差异的多版本文档时仍能保持准确性。其创新性体现在预处理、查询重写和上下文选择机制的新型组合。此外，QAMR采用双重分块策略，为检索和答案生成分别优化分块大小，从而提升整体问答准确率。我们使用公开的软件工程基准测试集以及来自工业合作伙伴Ciena的真实多版本系统文档集对QAMR进行评估。评估得出五项主要结论：（1）QAMR在基于RAG的基线对话系统基础上实现显著提升，平均答案正确率达88.5%，平均检索准确率达90%，分别对应16.5%和12%的改进；（2）消融实验表明QAMR处理多版本文档的机制直接提升了答案准确性；（3）与组件消融变体相比，QAMR在最佳消融版本基础上实现了19.6%的平均答案正确率提升和14.0%的平均检索准确率提升；（4）QAMR较基线系统平均降低8%的响应时间；（5）评估中采用的自动计算准确度指标与专家人工评估结果高度相关，验证了方法论的可靠性。