Parameter-Efficient Fine-Tuning (PEFT) is widely applied as the backend of fine-tuning APIs for large language model (LLM) customization in datacenters. Service providers deploy separate instances for individual PEFT tasks, giving rise to prominent resource inefficiencies, including (1) GPU underutilization from small-scale, PEFT-native operators and (2) device stalls from communication delays and data dependencies in parallelized execution. To address these issues, this paper presents MuxTune, a fine-tuning system that enables resource-efficient concurrent execution of multiple PEFT tasks. The key idea is to multiplex the backbone across independent tasks in a spatial-temporal manner for improved utilization and reduced stalls. Building on flexible, modularized backbone sharing via unified PEFT representations, MuxTune proposes hierarchical co-scheduling scheme with task, operator, and data-level optimizations. Specifically, it fuses tasks through a hybrid of spatial and temporal multiplexing, and orchestrates multi-task operator execution in two-tiered hybrid parallelism. Additionally, MuxTune employs chunk-based data alignment to mitigate inter-task ineffective tokens. Experimental results demonstrate that MuxTune achieves up to $2.33\times$ higher throughput and $5.29\times$ memory reduction compared to three state-of-the-art baselines.
翻译:参数高效微调(PEFT)作为数据中心大语言模型(LLM)定制化微调API的后端被广泛应用。服务提供商为每个PEFT任务部署独立实例,导致了显著的资源低效问题,包括:(1)小规模PEFT原生算子导致的GPU利用率不足;(2)并行执行中通信延迟与数据依赖造成的设备停顿。为解决这些问题,本文提出了MuxTune——一个支持多PEFT任务资源高效并发执行的微调系统。其核心思想是通过时空方式在独立任务间复用主干模型,以提高利用率并减少停顿。基于通过统一PEFT表示实现的灵活模块化主干共享,MuxTune提出了包含任务、算子和数据级优化的分层协同调度方案。具体而言,它通过时空混合复用融合任务,并在两级混合并行中编排多任务算子执行。此外,MuxTune采用基于分块的数据对齐机制来减少任务间无效令牌的影响。实验结果表明,与三种先进基线方法相比,MuxTune实现了高达$2.33\times$的吞吐量提升和$5.29\times$的内存缩减。