Translation Quality Estimation (TQE) is an important step before deploying the output translation into usage. TQE is also critical in assessing machine translation (MT) and human translation (HT) quality without seeing the reference translations. In this work, we examine if the state-of-the-art large language models (LLMs) can be fine-tuned for the TQE task and their capability. We take ChatGPT as one example and approach TQE as a binary classification task. Using English to Italian, German, French, Japanese, Dutch, Portuguese, Turkish, and Chinese training corpora, our experimental results show that fine-tuned ChatGPT via its API can achieve a relatively high score on predicting translation quality, i.e. if the translation needs to be edited, but there is definitely much space to improve the accuracy. English-Italiano bilingual Abstract is available in the paper.
翻译:翻译质量估计(TQE)是将翻译输出投入实际应用前的重要步骤。TQE在无需参考译文的情况下评估机器翻译(MT)和人工翻译(HT)质量时也至关重要。本研究探讨了最先进的大语言模型(LLMs)能否针对TQE任务进行微调及其能力表现。我们以ChatGPT为例,将TQE视为二分类任务。基于英译意大利语、德语、法语、日语、荷兰语、葡萄牙语、土耳其语和中文的训练语料,实验结果表明:通过API微调的ChatGPT在预测翻译质量(即判断译文是否需要编辑)方面可获得较高分数,但精度提升空间依然显著。论文提供英意双语摘要。