ChatGPT shows remarkable capabilities for machine translation (MT). Several prior studies have shown that it achieves comparable results to commercial systems for high-resource languages, but lags behind in complex tasks, e.g., low-resource and distant-language-pairs translation. However, they usually adopt simple prompts which can not fully elicit the capability of ChatGPT. In this paper, we aim to further mine ChatGPT's translation ability by revisiting several aspects: temperature, task information, and domain information, and correspondingly propose an optimal temperature setting and two (simple but effective) prompts: Task-Specific Prompts (TSP) and Domain-Specific Prompts (DSP). We show that: 1) The performance of ChatGPT depends largely on temperature, and a lower temperature usually can achieve better performance; 2) Emphasizing the task information can further improve ChatGPT's performance, particularly in complex MT tasks; 3) Introducing domain information can elicit ChatGPT's generalization ability and improve its performance in the specific domain; 4) ChatGPT tends to generate hallucinations for non-English-centric MT tasks, which can be partially addressed by our proposed prompts but still need to be highlighted for the MT/NLP community. We also explore the effects of advanced in-context learning strategies and find a (negative but interesting) observation: the powerful chain-of-thought prompt leads to word-by-word translation behavior, thus bringing significant translation degradation.
翻译:ChatGPT在机器翻译(MT)方面展现出显著能力。先前多项研究表明,ChatGPT在高资源语言翻译中能达到与商业系统相当的水平,但在复杂任务(如低资源语言及远距离语言对翻译)中仍存在差距。然而,这些研究通常采用简单提示(prompt),未能充分激发ChatGPT的潜力。本文旨在通过重新审视温度参数(temperature)、任务信息和领域信息三个维度,进一步挖掘ChatGPT的翻译能力,并相应提出最优温度设置及两种(简单而有效的)提示策略:任务特定提示(TSP)和领域特定提示(DSP)。我们发现:1)ChatGPT的性能严重依赖温度设置,较低温度通常能取得更优效果;2)强调任务信息可进一步提升ChatGPT性能,尤其在复杂机器翻译任务中;3)引入领域信息能激发ChatGPT的泛化能力,提升其在特定领域的表现力;4)ChatGPT在非英语中心的机器翻译任务中易产生幻觉(hallucination),我们的提示策略可部分缓解该问题,但仍需引起机器翻译与自然语言处理领域的关注。此外,我们探究了先进上下文学习策略的效果,并发现一个(负面但有趣的)现象:强大的思维链提示(chain-of-thought prompt)会引发逐词翻译行为,从而显著降低翻译质量。