Pre-trained language models have become an integral component of question-answering systems, achieving remarkable performance. For practical deployment, it is critical to carry out knowledge distillation to preserve high performance under computational constraints. In this paper, we address a key question: given the importance of unsupervised distillation for student performance, how does one effectively ensemble knowledge from multiple teachers at this stage without the guidance of ground-truth labels? We propose a novel algorithm, GOVERN, to tackle this issue. GOVERN has demonstrated significant improvements in both offline and online experiments. The proposed algorithm has been successfully deployed in a real-world commercial question-answering system.
翻译:预训练语言模型已成为问答系统的核心组成部分,展现出卓越的性能。在实际部署中,关键问题在于如何在计算资源受限的条件下通过知识蒸馏保持高性能。本文针对一个重要问题展开研究:鉴于无监督蒸馏对学生模型性能的关键作用,如何在没有真实标签指导的情况下,在这一阶段有效集成来自多个教师的知识?我们提出了一种新颖算法GOVERN来解决该问题。GOVERN在离线和在线实验中均展现出显著性能提升。该算法已成功部署于真实的商业问答系统中。