This paper delves into the continuous post-training optimization methods for small language models, and proposes a continuous post-training alignment data construction method for small language models. The core of this method is based on the data guidance of large models, optimizing the diversity and accuracy of alignment data. In addition, to verify the effectiveness of the methods in this paper, we used Qwen2-0.5B-Instruct model as the baseline model for small language models, using the alignment dataset constructed by our proposed method, we trained and compared several groups of experiments, including SFT (Supervised Fine Tuning) post-training experiment and KTO (Kahneman Tversky optimization) post-training experiment, as well as SFT-KTO two-stage post-training experiment and model weight fusion experiment. Finally, we evaluated and analyzed the performance of post-training models, and confirmed that the continuous post-training optimization method proposed by us can significantly improve the performance of small language models.
翻译:本文深入探讨了小语言模型的持续后训练优化方法,并提出了一种针对小语言模型的持续后训练对齐数据构建方法。该方法的核心在于基于大模型的数据指导,优化对齐数据的多样性与准确性。此外,为验证本文方法的有效性,我们采用Qwen2-0.5B-Instruct模型作为小语言模型的基线模型,利用所提方法构建的对齐数据集,训练并比较了多组实验,包括SFT(监督微调)后训练实验与KTO(卡尼曼-特沃斯基优化)后训练实验,以及SFT-KTO两阶段后训练实验与模型权重融合实验。最终,我们对后训练模型的性能进行了评估与分析,证实了我们提出的持续后训练优化方法能够显著提升小语言模型的性能。