Learned metrics such as BLEURT have in recent years become widely employed to evaluate the quality of machine translation systems. Training such metrics requires data which can be expensive and difficult to acquire, particularly for lower-resource languages. We show how knowledge can be distilled from Large Language Models (LLMs) to improve upon such learned metrics without requiring human annotators, by creating synthetic datasets which can be mixed into existing datasets, requiring only a corpus of text in the target language. We show that the performance of a BLEURT-like model on lower resource languages can be improved in this way.
翻译:近年来,诸如BLEURT等学习型指标已被广泛用于评估机器翻译系统的质量。训练此类指标所需的数据可能成本高昂且难以获取,尤其是对于低资源语言。我们展示了如何通过从大型语言模型(LLMs)中蒸馏知识,在不依赖人工标注员的情况下改进此类学习型指标。具体方法是通过创建合成数据集,将其混合到现有数据集中,且仅需目标语言的文本语料库。我们证明,通过这种方式,类似BLEURT的模型在低资源语言上的性能可以得到提升。