We propose an efficient layer-specific optimization (ELO) method designed to enhance continual pretraining (CP) for specific languages in multilingual large language models (MLLMs). This approach addresses the common challenges of high computational cost and degradation of source language performance associated with traditional CP. The ELO method consists of two main stages: (1) ELO Pretraining, where a small subset of specific layers, identified in our experiments as the critically important first and last layers, are detached from the original MLLM and trained with the target language. This significantly reduces not only the number of trainable parameters but also the total parameters computed during the forward pass, minimizing GPU memory consumption and accelerating the training process. (2) Layer Alignment, where the newly trained layers are reintegrated into the original model, followed by a brief full fine-tuning step on a small dataset to align the parameters. Experimental results demonstrate that the ELO method achieves a training speedup of up to 6.46 times compared to existing methods, while improving target language performance by up to 6.2\% on qualitative benchmarks and effectively preserving source language (English) capabilities.
翻译:本文提出一种高效的层特异性优化(ELO)方法,旨在增强多语言大语言模型(MLLMs)针对特定语言的持续预训练(CP)。该方法解决了传统持续预训练中常见的计算成本高昂以及源语言性能退化等挑战。ELO方法包含两个主要阶段:(1)ELO预训练阶段:从原始MLLM中分离出经实验验证为至关重要的首层与末层等少量特定层,使用目标语言数据进行训练。此举不仅显著减少了可训练参数量,同时降低了前向传播过程中参与计算的总参数量,从而有效节省GPU内存占用并加速训练过程。(2)层对齐阶段:将新训练完成的层重新整合至原始模型中,随后在小型数据集上进行简短的全模型微调以实现参数对齐。实验结果表明,与现有方法相比,ELO方法可实现高达6.46倍的训练加速,同时在定性评估基准上将目标语言性能提升最高达6.2%,并能有效保持源语言(英语)能力。