Idiomatic translation remains a significant challenge in machine translation, especially for low resource languages such as Urdu, and has received limited prior attention. To advance research in this area, we introduce the first evaluation datasets for Urdu to English idiomatic translation, covering both Native Urdu and Roman Urdu scripts and annotated with gold-standard English equivalents. We evaluate multiple open-source Large Language Models (LLMs) and Neural Machine Translation (NMT) systems on this task, focusing on their ability to preserve idiomatic and cultural meaning. Automatic metrics including BLEU, BERTScore, COMET, and XCOMET are used to assess translation quality. Our findings indicate that prompt engineering enhances idiomatic translation compared to direct translation, though performance differences among prompt types are relatively minor. Moreover, cross script comparisons reveal that text representation substantially affects translation quality, with Native Urdu inputs producing more accurate idiomatic translations than Roman Urdu.
翻译:习语翻译在机器翻译中仍然是一个重大挑战,特别是对于乌尔都语等低资源语言,此前受到的关注有限。为了推动该领域的研究,我们引入了首个乌尔都语到英语习语翻译的评估数据集,涵盖原生乌尔都语和罗马化乌尔都语两种文字,并标注了黄金标准的英语对应表达。我们评估了多种开源大型语言模型和神经机器翻译系统在此任务上的表现,重点关注其保留习语和文化含义的能力。使用包括BLEU、BERTScore、COMET和XCOMET在内的自动指标来评估翻译质量。我们的研究结果表明,与直接翻译相比,提示工程能提升习语翻译的效果,尽管不同提示类型之间的性能差异相对较小。此外,跨文字比较显示,文本表示形式显著影响翻译质量,原生乌尔都语输入比罗马化乌尔都语能产生更准确的习语翻译。