Recent studies have shown that leveraging off-the-shelf or fine-tuned retrievers, capable of retrieving high-quality in-context examples, significantly improves in-context learning of English. However, adapting these methods to other languages, especially low-resource ones, presents challenges due to the scarcity of available cross-lingual retrievers and annotated data. In this paper, we introduce XAMPLER: Cross-Lingual Example Retrieval, a method tailored to tackle the challenge of cross-lingual in-context learning using only annotated English data. XAMPLER first trains a retriever with positive/negative English samples, which are constructed based on the predictions of the multilingual large language model for in-context learning. Then, the trained retriever is directly employed to retrieve English examples as few-shot examples for in-context learning of target languages. Experiments on the massively multilingual text classification benchmark of SIB200 with 176 languages demonstrate that XAMPLER substantially improves the in-context learning performance across languages. Our code is available at https://github.com/cisnlp/XAMPLER.
翻译:近期研究表明,利用现成或微调的检索器来获取高质量的上下文示例,可显著提升英文情境下的上下文学习效果。然而,将这些方法适配到其他语言(尤其是低资源语言)时,由于缺乏可用的跨语言检索器和标注数据,面临诸多挑战。本文提出XAMPLER:跨语言示例检索方法,该方法专为解决仅使用英文标注数据实现跨语言上下文学习的难题而设计。XAMPLER首先基于多语言大语言模型在上下文学习中的预测结果构建正/负英文样本,并据此训练检索器;随后直接利用训练好的检索器为目标语言的上下文学习检索英文少样本示例。我们在涵盖176种语言的大规模多语言文本分类基准SIB200上的实验表明,XAMPLER显著提升了跨语言的上下文学习性能。我们的代码已开源至https://github.com/cisnlp/XAMPLER。