COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling

Large language models (LLMs) often exhibit performance disparities across languages, with naive multilingual fine-tuning frequently degrading performance due to negative cross-lingual interference. To address this, we introduce COMPASS (COntinual Multilingual PEFT with Adaptive Semantic Sampling), a novel data-centric framework for adapting LLMs to target languages. COMPASS leverages parameter-efficient fine-tuning (PEFT) by training lightweight, language-specific adapters on a judiciously selected subset of auxiliary multilingual data. The core of our method is a distribution-aware sampling strategy that uses multilingual embeddings and clustering to identify semantic gaps between existing training data and a target usage distribution. By prioritizing auxiliary data from under-represented semantic clusters, COMPASS maximizes positive cross-lingual transfer while minimizing interference. We extend this into a continual learning framework, COMPASS-ECDA, which monitors for data distribution shifts in production and dynamically updates adapters to prevent model staleness, balancing adaptation to new data with the preservation of existing knowledge. Across three different model architectures (Phi-4-Mini, Llama-3.1-8B, and Qwen2.5-7B) and multiple challenging multilingual benchmarks (Global-MMLU, MMLU-ProX), including unseen long-context tasks (OneRuler), we demonstrate that COMPASS consistently outperforms baseline methods guided by linguistic similarity, providing an effective, efficient, and sustainable solution for developing and maintaining high-performing multilingual models in dynamic environments.

翻译：大语言模型（LLMs）常因负面跨语言干扰而在不同语言间存在性能差异，朴素的多语言微调往往会加剧这一问题。为此，我们提出COMPASS（连续多语言参数高效微调与自适应语义采样），这是一种新颖的数据中心框架，用于将LLM适配至目标语言。COMPASS通过参数高效微调（PEFT）方法，在精心挑选的辅助多语言数据子集上训练轻量级、语言特定的适配器。其核心方法是一种分布感知采样策略，利用多语言嵌入和聚类识别现有训练数据与目标使用分布之间的语义缺口。通过优先选择来自语义代表性不足的聚类中的辅助数据，COMPASS在最大化正向跨语言迁移的同时最小化干扰。我们将其扩展为连续学习框架COMPASS-ECDA，该框架在生成环境中监测数据分布漂移并动态更新适配器以防止模型过时，在适配新数据与保留现有知识之间取得平衡。在三种不同模型架构（Phi-4-Mini、Llama-3.1-8B和Qwen2.5-7B）及多个具有挑战性的多语言基准（包括Global-MMLU、MMLU-ProX和未见过的长上下文任务OneRuler）上的实验表明，COMPASS始终优于以语言相似性为指导的基线方法，为在动态环境中开发和维护高性能多语言模型提供了一种有效、高效且可持续的解决方案。