Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine

This report provides a preliminary evaluation of ChatGPT for machine translation, including translation prompt, multilingual translation, and translation robustness. We adopt the prompts advised by ChatGPT to trigger its translation ability and find that the candidate prompts generally work well with minor performance differences. By evaluating on a number of benchmark test sets, we find that ChatGPT performs competitively with commercial translation products (e.g., Google Translate) on high-resource European languages but lags behind significantly on low-resource or distant languages. As for the translation robustness, ChatGPT does not perform as well as the commercial systems on biomedical abstracts or Reddit comments but exhibits good results on spoken language. Further, we explore an interesting strategy named $\mathbf{pivot~prompting}$ for distant languages, which asks ChatGPT to translate the source sentence into a high-resource pivot language before into the target language, improving the translation performance noticeably. With the launch of the GPT-4 engine, the translation performance of ChatGPT is significantly boosted, becoming comparable to commercial translation products, even for distant languages. Human analysis on Google Translate and ChatGPT suggests that ChatGPT with GPT-3.5 tends to generate more hallucinations and mis-translation errors while that with GPT-4 makes the least errors. In other words, ChatGPT has already become a good translator. Please refer to our Github project for more details: https://github.com/wxjiao/Is-ChatGPT-A-Good-Translator

翻译：本报告对ChatGPT在机器翻译中的表现进行了初步评估，涵盖翻译提示、多语言翻译和翻译鲁棒性。我们采用ChatGPT自身建议的提示来触发其翻译能力，发现候选提示整体表现良好，仅存在微小性能差异。通过在多个基准测试集上的评估，我们发现在高资源欧洲语言上，ChatGPT与商业翻译产品（如谷歌翻译）具有竞争力，但在低资源或语言距离较远的语言上显著落后。在翻译鲁棒性方面，ChatGPT在生物医学摘要或Reddit评论上的表现不如商业系统，但在口语文本上表现出良好结果。此外，我们探索了一种针对远距离语言的有趣策略，称为$\mathbf{枢轴提示}$（pivot prompting），该策略要求ChatGPT先将源语言句子翻译成高资源枢轴语言，再翻译成目标语言，从而显著提升了翻译性能。随着GPT-4引擎的推出，ChatGPT的翻译性能得到大幅提升，即便对于远距离语言，其表现也可与商业翻译产品相媲美。对谷歌翻译和ChatGPT的人工分析表明，搭载GPT-3.5的ChatGPT更容易产生幻觉和误译错误，而搭载GPT-4的ChatGPT错误最少。换言之，ChatGPT已成为一个优秀的翻译器。更多详情请参阅我们的GitHub项目：https://github.com/wxjiao/Is-ChatGPT-A-Good-Translator