TARJAMAT: Evaluation of Bard and ChatGPT on Machine Translation of Ten Arabic Varieties

Large language models (LLMs) finetuned to follow human instructions have recently emerged as a breakthrough in AI. Models such as Google Bard and OpenAI ChatGPT, for example, are surprisingly powerful tools for question answering, code debugging, and dialogue generation. Despite the purported multilingual proficiency of these models, their linguistic inclusivity remains insufficiently explored. Considering this constraint, we present a thorough assessment of Bard and ChatGPT (encompassing both GPT-3.5 and GPT-4) regarding their machine translation proficiencies across ten varieties of Arabic. Our evaluation covers diverse Arabic varieties such as Classical Arabic, Modern Standard Arabic, and several nuanced dialectal variants. Furthermore, we undertake a human-centric study to scrutinize the efficacy of the most recent model, Bard, in following human instructions during translation tasks. Our exhaustive analysis indicates that LLMs may encounter challenges with certain Arabic dialects, particularly those for which minimal public data exists, such as Algerian and Mauritanian dialects. However, they exhibit satisfactory performance with more prevalent dialects, albeit occasionally trailing behind established commercial systems like Google Translate. Additionally, our analysis reveals a circumscribed capability of Bard in aligning with human instructions in translation contexts. Collectively, our findings underscore that prevailing LLMs remain far from inclusive, with only limited ability to cater for the linguistic and cultural intricacies of diverse communities.

翻译：近期，经过微调以遵循人类指令的大型语言模型（LLMs）已成为人工智能领域的突破性进展。例如，Google Bard和OpenAI ChatGPT等模型在问答、代码调试和对话生成方面展现出惊人的能力。尽管这些模型声称具备多语言能力，但其语言包容性仍未被充分探索。针对这一局限性，我们对Bard和ChatGPT（涵盖GPT-3.5和GPT-4）在十种阿拉伯语变体上的机器翻译能力进行了全面评估。评估覆盖多种阿拉伯语变体，包括古典阿拉伯语、现代标准阿拉伯语及若干细微差异的方言变体。此外，我们开展了一项以人为中心的研究，以审视最新模型Bard在翻译任务中遵循人类指令的效果。我们的深入分析表明，LLMs在处理某些阿拉伯语方言（尤其是公开数据极少的方言，如阿尔及利亚和毛里塔尼亚方言）时可能面临挑战。然而，在更常见的方言上，它们的表现令人满意，尽管偶尔仍落后于Google Translate等成熟商业系统。此外，我们的分析揭示了Bard在翻译语境中与人类指令对齐的能力有限。综合而言，我们的研究结果强调，当前LLMs远未实现包容性，只能有限地满足不同社区的语言和文化复杂性需求。

相关内容

Machine Translation

关注 210

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日