Translation Quality Estimation (TQE) is an important step before deploying the output translation into usage. TQE is also critical in assessing machine translation (MT) and human translation (HT) quality without seeing the reference translations. In this work, we examine if the state-of-the-art large language models (LLMs) can be fine-tuned for the TQE task and their capability. We take ChatGPT as one example and approach TQE as a binary classification task. Using English-Italian and English-German training corpus, our experimental results show that fine-tuned ChatGPT via its API can achieve a relatively high score on predicting translation quality, i.e. if the translation needs to be edited, but there is definitely space to improve the accuracy. English-Italiano bilingual Abstract is available in the paper.
翻译:翻译质量评估(TQE)是将输出翻译部署使用前的重要步骤。在未见参考译文的情况下,TQE对于评估机器翻译(MT)和人工翻译(HT)质量同样至关重要。本研究探究最先进的大语言模型(LLMs)能否通过微调用于TQE任务及其能力表现。我们以ChatGPT为例,将TQE作为二分类任务进行处理。基于英-意和英-德训练语料的实验结果表明,通过API微调的ChatGPT在预测翻译质量(即是否需要编辑译文)方面能达到相对较高的分数,但准确性仍有提升空间。论文中附有英意双语摘要。