Neural machine translation (NMT) has shown impressive performance when trained on large-scale corpora. However, generic NMT systems have demonstrated poor performance on out-of-domain translation. To mitigate this issue, several domain adaptation methods have recently been proposed which often lead to better translation quality than genetic NMT systems. While there has been some continuous progress in NMT for English and other European languages, domain adaption in Arabic has received little attention in the literature. The current study, therefore, aims to explore the effectiveness of domain-specific adaptation for Arabic MT (AMT), in yet unexplored domain, financial news articles. To this end, we developed carefully a parallel corpus for Arabic-English (AR- EN) translation in the financial domain for benchmarking different domain adaptation methods. We then fine-tuned several pre-trained NMT and Large Language models including ChatGPT-3.5 Turbo on our dataset. The results showed that the fine-tuning is successful using just a few well-aligned in-domain AR-EN segments. The quality of ChatGPT translation was superior than other models based on automatic and human evaluations. To the best of our knowledge, this is the first work on fine-tuning ChatGPT towards financial domain transfer learning. To contribute to research in domain translation, we made our datasets and fine-tuned models available at https://huggingface.co/asas-ai/.
翻译:神经机器翻译(NMT)在大规模语料库上训练时展现出卓越性能。然而,通用NMT系统在跨领域翻译中表现欠佳。为缓解此问题,近年来研究者提出了多种领域适配方法,这些方法通常能实现优于通用NMT系统的翻译质量。尽管面向英语及其他欧洲语言的NMT研究持续取得进展,但阿拉伯语的领域适配在文献中鲜受关注。本研究旨在探索领域特定适配对阿拉伯语机器翻译(AMT)的有效性,并聚焦于此前未涉足的金融新闻文章领域。为此,我们精心构建了金融领域阿拉伯语-英语(AR-EN)平行语料库,用于基准测试不同领域适配方法。随后,我们在数据集上微调了包括ChatGPT-3.5 Turbo在内的多个预训练NMT模型及大语言模型。结果表明,仅使用少量高质量对齐的领域内AR-EN片段即可成功实现微调。基于自动评估与人工评估,ChatGPT的翻译质量优于其他模型。据我们所知,这是首个针对金融领域迁移学习微调ChatGPT的研究。为促进领域翻译研究,我们将数据集与微调模型公开于https://huggingface.co/asas-ai/。