Cross-lingual in-context learning (XICL) has emerged as a transformative paradigm for leveraging large language models (LLMs) to tackle multilingual tasks, especially for low-resource languages. However, existing approaches often rely on external retrievers or task-specific fine-tuning, limiting their scalability and generalizability. In this paper, we propose a novel self-supervised framework that harnesses the generative capabilities of LLMs to internally select and utilize task-relevant examples. Our method introduces two key objectives: a retrieval-generation alignment loss to optimize the quality of selected examples and a semantic coherence loss to ensure cross-lingual consistency. Through extensive experiments on multilingual benchmarks, our approach achieves state-of-the-art performance, significantly outperforming existing baselines. Further analysis highlights its robustness across diverse language families and its ability to generalize to unseen tasks. Human evaluations confirm the superior fluency, relevance, and semantic correctness of outputs generated by our method. This work provides a scalable, effective, and generalizable solution for cross-lingual in-context learning.
翻译:跨语言上下文学习(XICL)已成为利用大型语言模型(LLM)处理多语言任务(尤其是低资源语言)的变革性范式。然而,现有方法通常依赖外部检索器或任务特定微调,限制了其可扩展性和泛化能力。本文提出一种新颖的自监督框架,该框架利用LLM的生成能力在内部选择并利用任务相关示例。我们的方法引入了两个关键目标:检索-生成对齐损失以优化所选示例的质量,以及语义连贯性损失以确保跨语言一致性。通过在多语言基准测试上的广泛实验,我们的方法实现了最先进的性能,显著优于现有基线。进一步分析突显了其在不同语系间的鲁棒性以及泛化至未见任务的能力。人工评估证实了本方法生成输出在流畅性、相关性和语义正确性方面的优越性。这项工作为跨语言上下文学习提供了一种可扩展、高效且可泛化的解决方案。