On-device large language models commonly employ task-specific adapters (e.g., LoRAs) to deliver strong performance on downstream tasks. While storing all available adapters is impractical due to memory constraints, mobile devices typically have sufficient capacity to store a limited number of these parameters. This raises a critical challenge: how to select representative adapters that generalize well across multiple tasks - a problem that remains unexplored in existing literature. We propose a novel method D2C for adapter clustering that leverages minimal task-specific examples (e.g., 10 per task) and employs an iterative optimization process to refine cluster assignments. The adapters within each cluster are merged, creating multi-task adapters deployable on resource-constrained devices. Experimental results demonstrate that our method effectively boosts performance for considered storage budgets.
翻译:设备端大语言模型通常采用任务特定的适配器(例如LoRA)以在下游任务上实现强劲性能。由于内存限制,存储所有可用适配器并不现实,但移动设备通常具备足够容量存储有限数量的此类参数。这引发了一个关键挑战:如何选择能够良好泛化至多个任务的代表性适配器——这一问题在现有文献中尚未得到探索。我们提出了一种新颖的适配器聚类方法D2C,该方法利用极少量任务特定示例(例如每任务10个)并采用迭代优化过程来精化聚类分配。每个聚类内的适配器被合并,从而创建可在资源受限设备上部署的多任务适配器。实验结果表明,我们的方法在所考虑的存储预算下有效提升了性能。