Multilingual Large Language Models (LLMs) have recently shown great capability in various tasks, exhibiting state-of-the-art performance using few-shot or zero-shot prompting methods. While these models have been extensively studied in tasks where inputs are assumed to be in a single language, less attention has been paid to exploring their performance when inputs involve code-switching (CSW). In this paper, we provide an extensive empirical study of various multilingual LLMs and benchmark their performance in three tasks: sentiment analysis, machine translation, and word-level language identification. Our findings indicate that despite multilingual LLMs showing promising outcomes in certain tasks when using zero-/few-shot prompting, their performance still falls short on average when compared to smaller finetuned models. We argue that LLMs that are "multilingual" are not necessarily code-switching compatible and extensive future research is required to fully bridge this gap.
翻译:多语言大语言模型(LLMs)近期在多种任务中展现出强大能力,通过少样本或零样本提示方法取得了当前最优性能。尽管这些模型在假设输入为单语言的任务中已被广泛研究,但针对涉及语码转换(CSW)的输入性能探索仍相对不足。本文对多种多语言大语言模型进行了全面的实证研究,在情感分析、机器翻译和词级语言识别三项任务中对其性能进行基准测试。研究结果表明,尽管多语言大语言模型在零样本/少样本提示某些任务时展现出可观成果,但其平均表现仍逊于规模较小的微调模型。我们认为,具备"多语言"能力的大语言模型未必兼容语码转换,未来需要大量深入研究以弥合这一差距。