On-device deployment of Large Language Models (LLMs) frequently leverages Low-Rank Adapters (LoRAs) to support diverse downstream tasks under tight resource constraints. To address the limited storage capacity of mobile devices, recent works have explored model merging techniques to fuse multiple LoRAs into a single one. In practice, however, LoRAs are often delivered incrementally, as users request support for new tasks (e.g., novel problem types or languages). This scenario introduces a new challenge: on-device online continual merging, where the objective is to incorporate new LoRAs while preserving the performance on previously supported tasks. In this paper, we propose a data-free and computationally efficient strategy for selecting and merging LoRAs when a new one becomes available, assuming the device can store only a limited number of adapters. Extensive experiments across real-world tasks demonstrate the superiority of our approach compared to alternative strategies while adhering to the storage budget and compute limitations of on-device settings.
翻译:设备端大语言模型(LLMs)的部署常利用低秩适配器(LoRAs)在严格的资源限制下支持多样化的下游任务。为应对移动设备有限的存储容量,近期研究探索了模型合并技术,将多个LoRAs融合为单一适配器。然而在实践中,LoRAs通常随着用户对新任务(例如新型问题类型或语言)的需求而增量交付。这一场景引出了一个新挑战:设备端在线持续合并,其目标是在纳入新LoRAs的同时保持对先前支持任务的性能。本文提出一种无需数据且计算高效的策略,用于在新适配器可用时进行LoRAs的选择与合并,并假设设备仅能存储有限数量的适配器。在真实任务场景中的大量实验表明,在遵循设备端设置的存储预算与计算限制的前提下,本方法相较于其他策略具有显著优势。