Large Language Models (LLMs) have demonstrated excellent performance on Machine Translation Quality Estimation (MTQE), yet their high inference costs make them impractical for direct application. In this work, we propose applying LLMs to generate MQM-style annotations for training a COMET model: following Fernandes et al. (2023), we reckon that segment-level annotations provide a strong rationale for LLMs and are key to good segment-level QE. We propose a simplified MQM scheme, mostly restricted to top-level categories, to guide LLM selection. We present a systematic approach for the development of a GPT-4o-based prompt, called PPbMQM (Prompt-Pattern-based-MQM). We show that the resulting annotations correlate well with human annotations and that training COMET on them leads to competitive performance on segment-level QE for Chinese-English and English-German.
翻译:大语言模型(LLMs)在机器翻译质量评估(MTQE)任务中已展现出卓越性能,但其高昂的推理成本使其难以直接投入实际应用。在本研究中,我们提出利用LLMs生成MQM风格标注以训练COMET模型:遵循Fernandes等人(2023)的研究思路,我们认为片段级标注能为LLMs提供充分的判断依据,是实现优质片段级质量评估的关键。我们设计了一套简化的MQM标注方案,主要限定于顶层错误类别,用以指导LLM的标注选择。我们提出了一种系统化的提示词开发方法,构建了基于提示模式的MQM标注框架(PPbMQM)。实验表明,该方法生成的标注结果与人工标注具有良好相关性,基于此训练的COMET模型在汉英和英德语言对的片段级质量评估任务中均取得了具有竞争力的性能。