MergePipe: A Budget-Aware Parameter Management System for Scalable LLM Merging

Large language model (LLM) merging has become a key technique in modern LLM development pipelines, enabling the integration of multiple task- or domain-specific expert models without retraining. However, as the number of experts grows, existing merging implementations treat model parameters as unstructured files and execute merges in a stateless, one-shot manner, leading to excessive disk I/O, redundant parameter scans, and poor scalability. In this paper, we present \textbf{MergePipe}, a parameter management system for scalable LLM merging. MergePipe is the first system that treats LLM merging as a data management and execution problem, and introduces a catalog-driven abstraction over model parameters, merge plans, and execution lineage. At its core, MergePipe employs a cost-aware planner that explicitly models expert parameter I/O and enforces user-specified I/O budgets, followed by a streaming execution engine that materializes merged models under transactional guarantees. Our key insight is that while base model reads and output writes are unavoidable, expert parameter reads dominate merge cost and constitute the primary optimization target. By making expert access budget-aware throughout planning and execution, MergePipe mitigates the $O(K)$ I/O growth of naive pipelines and achieves predictable scaling behavior. Experiments show that MergePipe reduces total I/O by up to an order of magnitude and delivers up to $11\times$ end-to-end speedups (up to 90\% wall-time reduction) over state-of-the-art LLM merging pipelines.

翻译：大语言模型（LLM）融合已成为现代LLM开发流程中的关键技术，它能够在无需重新训练的情况下，集成多个面向特定任务或领域的专家模型。然而，随着专家模型数量的增加，现有的融合实现将模型参数视为非结构化文件，并以无状态、一次性方式执行融合，这导致了过度的磁盘I/O、冗余的参数扫描以及较差的可扩展性。本文提出\textbf{MergePipe}，一个用于可扩展LLM融合的参数管理系统。MergePipe是首个将LLM融合视为数据管理与执行问题的系统，并为模型参数、融合计划及执行谱系引入了目录驱动的抽象层。其核心在于，MergePipe采用了一个成本感知的规划器，该规划器显式地建模专家参数的I/O并强制执行用户指定的I/O预算，其后接一个流式执行引擎，在事务性保证下物化融合后的模型。我们的核心见解是：虽然基础模型的读取和输出写入不可避免，但专家参数的读取主导了融合成本，并构成了主要的优化目标。通过在规划与执行的整个过程中实现专家访问的预算感知，MergePipe缓解了朴素流程中$O(K)$的I/O增长，并实现了可预测的扩展行为。实验表明，与最先进的LLM融合流程相比，MergePipe将总I/O降低了高达一个数量级，并实现了高达$11\times$的端到端加速（高达90%的墙上时间缩减）。