Multilingual Large Language Models (LLMs) have recently shown great capabilities in a wide range of tasks, exhibiting state-of-the-art performance through zero-shot or few-shot prompting methods. While there have been extensive studies on their abilities in monolingual tasks, the investigation of their potential in the context of code-switching (CSW), the practice of alternating languages within an utterance, remains relatively uncharted. In this paper, we provide a comprehensive empirical analysis of various multilingual LLMs, benchmarking their performance across four tasks: sentiment analysis, machine translation, summarization and word-level language identification. Our results indicate that despite multilingual LLMs exhibiting promising outcomes in certain tasks using zero or few-shot prompting, they still underperform in comparison to fine-tuned models of much smaller scales. We argue that current "multilingualism" in LLMs does not inherently imply proficiency with code-switching texts, calling for future research to bridge this discrepancy.
翻译:多语言大语言模型(LLMs)近期在广泛任务中展现出强大能力,通过零样本或少样本提示方法实现了最先进的性能。尽管已有大量研究关注其在单语任务中的能力,但对其在代码转换(CSW,即同一话语中交替使用不同语言)场景下的潜力探索仍相对空白。本文对多种多语言LLMs进行了全面的实证分析,在四项任务中对其性能进行了基准测试:情感分析、机器翻译、摘要生成及词级语言识别。结果表明,尽管多语言LLMs在某些任务中通过零样本或少样本提示取得了令人期待的成果,但其表现仍逊色于规模更小但经过微调的模型。我们认为,当前LLMs的“多语言性”并不天然意味着其擅长处理代码转换文本,未来研究亟需填补这一差距。