Humans generally acquire new skills without compromising the old; however, the opposite holds for Large Language Models (LLMs), e.g., from LLaMA to CodeLLaMA. To this end, we propose a new post-pretraining method for LLMs with an expansion of Transformer blocks. We tune the expanded blocks using only new corpus, efficiently and effectively improving the model's knowledge without catastrophic forgetting. In this paper, we experiment on the corpus of code and math, yielding LLaMA Pro-8.3B, a versatile foundation model initialized from LLaMA2-7B, excelling in general tasks, programming, and mathematics. LLaMA Pro and its instruction-following counterpart (LLaMA Pro-Instruct) achieve advanced performance among various benchmarks, demonstrating superiority over existing open models in the LLaMA family and the immense potential of reasoning and addressing diverse tasks as an intelligent agent. Our findings provide valuable insights into integrating natural and programming languages, laying a solid foundation for developing advanced language agents that operate effectively in various environments.
翻译:人类通常能在不牺牲已有技能的情况下习得新技能,然而,大型语言模型(LLMs)的情况却恰恰相反(例如从LLaMA到CodeLLaMA)。为此,我们提出了一种针对LLMs的新型预训练后方法,该方法通过扩展Transformer块来实现。我们仅使用新语料库对扩展后的块进行调优,从而高效且有效地提升模型的知识水平,同时避免灾难性遗忘。本文中,我们基于代码和数学语料库进行实验,生成了LLaMA Pro-8.3B——一种以LLaMA2-7B初始化的多功能基础模型,在通用任务、编程和数学领域表现优异。LLaMA Pro及其指令遵循版本(LLaMA Pro-Instruct)在各类基准测试中均取得了先进性能,展现了优于现有LLaMA系列开源模型的能力,并彰显了其作为智能体在推理和处理多样化任务方面的巨大潜力。我们的研究结果为自然语言与编程语言的融合提供了宝贵见解,为开发能在多种环境中高效运行的先进语言智能体奠定了坚实基础。