Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars. However, in low-resource languages, obtaining such hand-picked exemplars can still be challenging, where unsupervised techniques may be necessary. Moreover, competent generative capabilities of LLMs are observed only in high-resource languages, while their performances among under-represented languages fall behind due to pre-training data imbalance. To elicit LLMs' ability onto low-resource languages without any supervised data, we propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English. These prompts are then used to create intra-lingual exemplars to perform tasks in the target languages. Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages. We also show that fine-tuning a 7B model on data generated from our method helps it perform competitively with a 175B model. In non-English translation tasks, our method even outperforms supervised prompting by up to 3 chrF++ in many low-resource languages. When evaluated on zero-shot multilingual summarization, our method surpasses other English-pivoting baselines by up to 4 ROUGE-L and is also favored by GPT-4.
翻译:大型语言模型(LLMs)能够通过观察少量示例有效执行任务。然而,在低资源语言中,获取此类精选示例仍具挑战性,此时可能需要无监督技术。此外,LLMs的生成能力仅在高资源语言中表现突出,而在低资源语言中,由于预训练数据分布不均,其性能相对落后。为了在无监督数据的情况下激发LLMs对低资源语言的能力,我们提出从多种高资源语言中整合合成示例,提示LLMs将任意语言翻译为英语。这些提示进一步用于生成目标语言的内语示例,以执行相应任务。我们的无监督提示方法在不同规模的LLMs中,针对13种印度语言和21种非洲低资源语言与英语之间的翻译任务,其性能与有监督的少样本学习相当。我们还证明,基于我们的方法生成的数据微调7B模型,使其性能可与175B模型相匹敌。在非英语翻译任务中,我们的方法在许多低资源语言上甚至比有监督提示方法高出最多3个chrF++分数。在零样本多语言摘要评估中,我们的方法在ROUGE-L指标上超越其他以英语为枢轴的基线模型最多4分,并获得GPT-4的偏好。