Large Language Model (LLM) has gained popularity and achieved remarkable results in open-domain tasks, but its performance in real industrial domain-specific scenarios is average since there is no specific knowledge in it. This issue has attracted widespread attention, but there are few relevant benchmarks available. In this paper, we provide a benchmark Question Answering (QA) dataset named MSQA, which is about Microsoft products and IT technical problems encountered by customers. This dataset contains industry cloud-specific QA knowledge, which is not available for general LLM, so it is well suited for evaluating methods aimed at improving domain-specific capabilities of LLM. In addition, we propose a new model interaction paradigm that can empower LLM to achieve better performance on domain-specific tasks where it is not proficient. Extensive experiments demonstrate that the approach following our model fusion framework outperforms the commonly used LLM with retrieval methods.
翻译:大语言模型在开放域任务中广受欢迎并取得了显著成果,但由于缺乏特定领域知识,其在真实工业场景下的表现平平。该问题已引起广泛关注,但相关基准测试数据集仍较为匮乏。本文提出了一个名为MSQA的问答基准数据集,涵盖微软产品及客户遇到的IT技术问题。该数据集包含通用大语言模型不具备的行业云领域问答知识,因此非常适合用于评估旨在提升大语言模型领域能力的各类方法。此外,我们提出了一种新的模型交互范式,可赋能大语言模型在非擅长领域任务中实现更优性能。大量实验表明,遵循我们模型融合框架的方法优于常用的结合检索方法的大语言模型。