The main limiting factor in the development of robust multilingual dialogue evaluation metrics is the lack of multilingual data and the limited availability of open sourced multilingual dialogue systems. In this work, we propose a workaround for this lack of data by leveraging a strong multilingual pretrained LLM and augmenting existing English dialogue data using Machine Translation. We empirically show that the naive approach of finetuning a pretrained multilingual encoder model with translated data is insufficient to outperform the strong baseline of finetuning a multilingual model with only source data. Instead, the best approach consists in the careful curation of translated data using MT Quality Estimation metrics, excluding low quality translations that hinder its performance.
翻译:在多语种对话评价指标的发展中,主要限制因素是多语言数据的缺乏以及开源多语言对话系统的有限可用性。针对这一数据不足问题,本文通过利用强大的多语言预训练大语言模型,并采用机器翻译技术扩充现有的英语对话数据,提出了一种解决方案。实验表明,仅使用翻译数据微调预训练多语言编码器模型的简单方法,不足以超越仅使用源数据微调多语言模型的强基线。相反,最佳策略在于使用机器翻译质量评估指标精心筛选翻译数据,排除那些阻碍性能的低质量翻译。