Large Language Models (LLM) have demonstrated their strong ability in the field of machine translation (MT), yet they suffer from high computational cost and latency. Therefore, transferring translation knowledge from giant LLMs to medium-sized machine translation models is a promising research direction. However, traditional knowledge distillation methods do not take the capability of student and teacher models into consideration, therefore repeatedly teaching student models on the knowledge they have learned, and failing to extend to novel contexts and knowledge. In this paper, we propose a framework called MT-Patcher, which transfers knowledge from LLMs to existing MT models in a selective, comprehensive and proactive manner. Considering the current translation ability of student MT models, we only identify and correct their translation errors, instead of distilling the whole translation from the teacher. Leveraging the strong language abilities of LLMs, we instruct LLM teachers to synthesize diverse contexts and anticipate more potential errors for the student. Experiment results on translating both specific language phenomena and general MT benchmarks demonstrate that finetuning the student MT model on about 10% examples can achieve comparable results to the traditional knowledge distillation method, and synthesized potential errors and diverse contexts further improve translation performances on unseen contexts and words.
翻译:摘要:大语言模型(LLM)已在机器翻译(MT)领域展现出强大能力,但其存在计算成本高、延迟大的问题。因此,将翻译知识从巨型LLM迁移至中型机器翻译模型是一个具有前景的研究方向。然而,传统知识蒸馏方法未考虑学生模型与教师模型的实际能力,导致反复向学生模型教授已掌握的知识,且无法扩展至新语境与知识。本文提出名为MT-Patcher的框架,该框架以选择性、全面性及主动性的方式将知识从LLM迁移至现有MT模型。具体而言,我们依据学生MT模型的当前翻译能力,仅识别并纠正其翻译错误,而非从教师模型蒸馏完整译文。同时,利用LLM强大的语言能力,引导LLM教师合成多样化语境并预判学生模型可能出现的更多错误。针对特定语言现象及通用MT基准的实验结果表明:仅需对学生MT模型微调约10%的样本,即可取得与传统知识蒸馏方法相当的翻译性能;而合成潜在错误与多样化语境可进一步改善模型在未见语境与词汇上的翻译表现。