Existing large language models struggle to support numerous low-resource languages, particularly the extremely low-resource ones where there is minimal training data available for effective parameter updating. We thus investigate whether LLMs can learn a new language on the fly solely through prompting. To study this question, we collect a research suite for Zhuang, a language supported by no LLMs currently. We introduce \textsc{DiPMT++}, a framework for adapting LLMs to unseen languages by in-context learning. Using a dictionary and only 5K parallel sentences, \textsc{DiPMT++} significantly enhances the performance of GPT-4 from 0 to 16 BLEU for Chinese-to-Zhuang translation and achieves 32 BLEU for Zhuang-to-Chinese translation. Furthermore, we demonstrate the practical utility of this framework in aiding humans to translate completely unseen languages, which could contribute to the preservation of linguistic diversity.
翻译:现有的大语言模型难以支持众多低资源语言,尤其是那些训练数据极少、无法进行有效参数更新的极低资源语言。为此,我们研究了大语言模型是否仅通过提示就能即时学会一门新语言。为探究此问题,我们针对壮语(一种目前没有任何大语言模型支持的语言)构建了一个研究集。我们提出了\textsc{DiPMT++}框架,该框架通过上下文学习使大语言模型适应未见语言。借助一本词典和仅5000条平行句,\textsc{DiPMT++}显著提升了GPT-4的性能:在汉译壮任务中,将其BLEU值从0提升至16;而在壮译汉任务中,则达到了32 BLEU。此外,我们展示了该框架在辅助人类翻译完全未见语言方面的实际效用,这将有助于维护语言多样性。