Although fine-tuning Large Language Models (LLMs) with multilingual data can rapidly enhance the multilingual capabilities of LLMs, they still exhibit a performance gap between the dominant language (e.g., English) and non-dominant ones due to the imbalance of training data across languages. To further enhance the performance of non-dominant languages, we propose ShifCon, a Shift-based Contrastive framework that aligns the internal forward process of other languages toward that of the dominant one. Specifically, it shifts the representations of non-dominant languages into the dominant language subspace, allowing them to access relatively rich information encoded in the model parameters. The enriched representations are then shifted back into their original language subspace before generation. Moreover, we introduce a subspace distance metric to pinpoint the optimal layer area for shifting representations and employ multilingual contrastive learning to further enhance the alignment of representations within this area. Experiments demonstrate that our ShifCon framework significantly enhances the performance of non-dominant languages, particularly for low-resource ones. Further analysis offers extra insights to verify the effectiveness of ShifCon and propel future research
翻译:尽管使用多语言数据对大语言模型(LLMs)进行微调能快速提升其多语言能力,但由于训练数据在不同语言间的不均衡性,模型在主导语言(如英语)与非主导语言之间仍存在性能差距。为提升非主导语言的表现,本文提出ShifCon——一种基于偏移的对比学习框架,该框架通过将其他语言的前向计算过程向主导语言对齐来实现性能增强。具体而言,该框架将非主导语言的表示偏移至主导语言的子空间,使其能够利用模型参数中编码的相对丰富的信息。增强后的表示在生成前会再偏移回其原始语言子空间。此外,我们引入了一种子空间距离度量来精确定位最适合进行表示偏移的层区域,并采用多语言对比学习进一步强化该区域内表示的对齐效果。实验表明,我们的ShifCon框架显著提升了非主导语言(尤其是低资源语言)的性能。进一步的分析提供了更多证据验证ShifCon的有效性,并为未来研究提供了方向。