Configurable Foundation Models: Building LLMs from a Modular Perspective

Chaojun Xiao,Zhengyan Zhang,Chenyang Song,Dazhi Jiang,Feng Yao,Xu Han,Xiaozhi Wang,Shuo Wang,Yufei Huang,Guanyu Lin,Yingfa Chen,Weilin Zhao,Yuge Tu,Zexuan Zhong,Ao Zhang,Chenglei Si,Khai Hao Moo,Chenyang Zhao,Huimin Chen,Yankai Lin,Zhiyuan Liu,Jingbo Shang,Maosong Sun

Advancements in LLMs have recently unveiled challenges tied to computational efficiency and continual scalability due to their requirements of huge parameters, making the applications and evolution of these models on devices with limited computation resources and scenarios requiring various abilities increasingly cumbersome. Inspired by modularity within the human brain, there is a growing tendency to decompose LLMs into numerous functional modules, allowing for inference with part of modules and dynamic assembly of modules to tackle complex tasks, such as mixture-of-experts. To highlight the inherent efficiency and composability of the modular approach, we coin the term brick to represent each functional module, designating the modularized structure as configurable foundation models. In this paper, we offer a comprehensive overview and investigation of the construction, utilization, and limitation of configurable foundation models. We first formalize modules into emergent bricks - functional neuron partitions that emerge during the pre-training phase, and customized bricks - bricks constructed via additional post-training to improve the capabilities and knowledge of LLMs. Based on diverse functional bricks, we further present four brick-oriented operations: retrieval and routing, merging, updating, and growing. These operations allow for dynamic configuration of LLMs based on instructions to handle complex tasks. To verify our perspective, we conduct an empirical analysis on widely-used LLMs. We find that the FFN layers follow modular patterns with functional specialization of neurons and functional neuron partitions. Finally, we highlight several open issues and directions for future research. Overall, this paper aims to offer a fresh modular perspective on existing LLM research and inspire the future creation of more efficient and scalable foundational models.

翻译：近年来，大型语言模型（LLMs）的发展暴露出因庞大参数量需求带来的计算效率与持续可扩展性挑战，使得这些模型在计算资源有限的设备及需要多样化能力的场景中的应用与演进日益困难。受人脑模块化特性的启发，学界逐渐倾向于将LLMs分解为多个功能模块，通过部分模块进行推理并动态组合模块以处理复杂任务（例如混合专家模型）。为凸显模块化方法固有的高效性与可组合性，我们提出“砖块”这一术语指代各功能模块，并将模块化结构定义为可配置基础模型。本文系统性地综述与探讨了可配置基础模型的构建、应用及局限。我们首先将模块形式化为两类：在预训练阶段涌现的功能性神经元分区（涌现砖块），以及通过额外训练后构建以增强LLMs能力与知识的定制砖块。基于多样化的功能砖块，我们进一步提出四种面向砖块的操作：检索与路由、合并、更新及生长。这些操作支持根据指令动态配置LLMs以应对复杂任务。为验证该视角，我们对广泛使用的LLMs进行了实证分析，发现前馈网络层遵循神经元功能专门化与功能性神经元分区的模块化模式。最后，我们指出了若干开放性问题与未来研究方向。总体而言，本文旨在为现有LLM研究提供全新的模块化视角，并启发未来构建更高效、可扩展的基础模型。