Large Language Model (LLM) has gained popularity and achieved remarkable results in open-domain tasks, but its performance in real industrial domain-specific scenarios is average since there is no specific knowledge in it. This issue has attracted widespread attention, but there are few relevant benchmarks available. In this paper, we provide a benchmark Question Answering (QA) dataset named MSQA, which is about Microsoft products and IT technical problems encountered by customers. This dataset contains industry cloud-specific QA knowledge, which is not available for general LLM, so it is well suited for evaluating methods aimed at improving domain-specific capabilities of LLM. In addition, we propose a new model interaction paradigm that can empower LLM to achieve better performance on domain-specific tasks where it is not proficient. Extensive experiments demonstrate that the approach following our model fusion framework outperforms the commonly used LLM with retrieval methods.
翻译:大型语言模型(LLM)在开放域任务中广受欢迎并取得了显著成果,但由于缺乏特定领域知识,其在真实工业领域特定场景中的表现平平。这一问题已引起广泛关注,但现有的相关基准测试却十分有限。本文提出了一个名为MSQA的基准问答(QA)数据集,该数据集涉及微软产品及客户遇到的IT技术问题。该数据集包含行业云特定的QA知识,这是通用LLM所不具备的,因此非常适合用于评估旨在提升LLM领域特定能力的方法。此外,我们提出了一种新的模型交互范式,能够赋能LLM在其不擅长的领域特定任务上取得更优表现。大量实验表明,遵循我们的模型融合框架的方法优于常用的结合检索方法的LLM。