This report provides a preliminary evaluation of ChatGPT for machine translation, including translation prompt, multilingual translation, and translation robustness. We adopt the prompts advised by ChatGPT to trigger its translation ability and find that the candidate prompts generally work well and show minor performance differences. By evaluating on a number of benchmark test sets, we find that ChatGPT performs competitively with commercial translation products (e.g., Google Translate) on high-resource European languages but lags behind significantly on low-resource or distant languages. For distant languages, we explore an interesting strategy named $\mathbf{pivot~prompting}$ that asks ChatGPT to translate the source sentence into a high-resource pivot language before into the target language, which improves the translation performance significantly. As for the translation robustness, ChatGPT does not perform as well as the commercial systems on biomedical abstracts or Reddit comments but exhibits good results on spoken language. With the launch of the GPT-4 engine, the translation performance of ChatGPT is significantly boosted, becoming comparable to commercial translation products, even for distant languages. In other words, $\mathbf{ChatGPT~has~already~become~a~good~translator!}$ Scripts and data: https://github.com/wxjiao/Is-ChatGPT-A-Good-Translator
翻译:本报告对ChatGPT在机器翻译方面的表现进行了初步评估,包括翻译提示、多语言翻译及翻译鲁棒性。我们采用ChatGPT建议的提示来激发其翻译能力,发现候选提示通常表现良好且性能差异较小。通过在多个基准测试集上的评估,我们发现ChatGPT在资源丰富的欧洲语言上能与商业翻译产品(如谷歌翻译)竞争,但在低资源或远距离语言上明显落后。针对远距离语言,我们探索了一种名为**“枢轴提示”**的有趣策略,即要求ChatGPT先将源句子翻译成高资源枢轴语言,再翻译成目标语言,这显著提升了翻译性能。在翻译鲁棒性方面,ChatGPT在生物医学摘要或Reddit评论上的表现不如商业系统,但在口语文本上表现良好。随着GPT-4引擎的推出,ChatGPT的翻译性能显著提升,即使对远距离语言也能与商业翻译产品相媲美。换言之,**ChatGPT已成为出色的翻译器!** 代码和数据:https://github.com/wxjiao/Is-ChatGPT-A-Good-Translator