We propose iteratively prompting a large language model to self-correct a translation, with inspiration from their strong language understanding and translation capability as well as a human-like translation approach. Interestingly, multi-turn querying reduces the output's string-based metric scores, but neural metrics suggest comparable or improved quality. Human evaluations indicate better fluency and naturalness compared to initial translations and even human references, all while maintaining quality. Ablation studies underscore the importance of anchoring the refinement to the source and a reasonable seed translation for quality considerations. We also discuss the challenges in evaluation and relation to human performance and translationese.
翻译:我们提出一种迭代式提示大语言模型进行翻译自我修正的方法,该方法受启发于模型强大的语言理解与翻译能力,以及类似人类翻译的迭代优化过程。有趣的是,多轮交互查询会导致基于字符串的指标得分下降,但神经指标显示质量持平或有所提升。人工评估表明,与初始翻译甚至人工参考译文相比,优化后的译文在流畅度和自然度上表现更优,同时保持翻译质量。消融实验强调了将修正过程锚定于源文本以及合理的初始翻译对保证质量的重要性。我们还探讨了评估中的挑战、与人工翻译表现的关系以及"翻译腔"现象。