Fumbling in Babel: An Investigation into ChatGPT's Language Identification Ability

ChatGPT has recently emerged as a powerful NLP tool that can carry out a variety of tasks. However, the range of languages ChatGPT can handle remains largely a mystery. To uncover which languages ChatGPT `knows', we investigate its language identification (LID) abilities. For this purpose, we compile Babel-670, a benchmark comprising 670 languages representing 24 language families spoken in five continents. Languages in Babel-670 run the gamut from the very high-resource to the very low-resource. We then study ChatGPT's (both GPT-3.5 and GPT-4) ability to (i) identify language names and language codes (ii) under zero- and few-shot conditions (iii) with and without provision of a label set. When compared to smaller finetuned LID tools, we find that ChatGPT lags behind. For example, it has poor performance on African languages. We conclude that current large language models would benefit from further development before they can sufficiently serve diverse communities.

翻译：ChatGPT近期已成为一种能够执行多种任务的强大自然语言处理工具。然而，ChatGPT可处理的语言范围在很大程度上仍是个谜。为揭示ChatGPT「认知」的语言种类，本研究对其语言识别能力进行了探究。为此，我们构建了Babel-670基准数据集，包含涵盖五大洲24个语系的670种语言。该数据集中的语言资源从极其丰富到极度匮乏不等。我们重点研究了ChatGPT（包括GPT-3.5和GPT-4）在以下条件下的表现：（i）识别语言名称与语言代码的能力；（ii）零样本与少样本设置下的表现；（iii）是否提供标签集的情况。通过对比规模较小的微调语言识别工具，我们发现ChatGPT的表现相对落后。例如，其对非洲语言的识别性能较差。我们得出结论：当前的大语言模型在充分服务多元社区之前，仍需进一步发展完善。

相关内容

ChatGPT

关注 258

ChatGPT（全名：Chat Generative Pre-trained Transformer），美国OpenAI 研发的聊天机器人程序 [1] ，于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，真正像人类一样来聊天交流，甚至能完成撰写邮件、视频脚本、文案、翻译、代码，写论文任务。 [1] https://openai.com/blog/chatgpt/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日