Parameter-Efficient Fine-Tuning (PEFT) has become the standard for customising Foundation Models (FMs) to user-specific downstream tasks. However, typical PEFT methods require storing multiple task-specific adapters, creating scalability issues as these adapters must be housed and run at the FM server. Traditional prompt tuning offers a potential solution by customising them through task-specific input prefixes, but it under-performs compared to other PEFT methods like LoRA. To address this gap, we propose Low-Rank Prompt Adaptation (LoPA), a prompt-tuning-based approach that performs on par with state-of-the-art PEFT methods and full fine-tuning while being more parameter-efficient and not requiring a server-based adapter. LoPA generates soft prompts by balancing between sharing task-specific information across instances and customization for each instance. It uses a low-rank decomposition of the soft-prompt component encoded for each instance to achieve parameter efficiency. We provide a comprehensive evaluation on multiple natural language understanding and code generation and understanding tasks across a wide range of foundation models with varying sizes.
翻译:参数高效微调已成为将基础模型定制至用户特定下游任务的标准方法。然而,典型的参数高效微调方法需要存储多个任务特定的适配器,当这些适配器必须在基础模型服务器端部署和运行时,会引发可扩展性问题。传统的提示调优通过任务特定的输入前缀进行定制,提供了潜在的解决方案,但其性能表现逊于LoRA等其他参数高效微调方法。为弥补这一差距,我们提出低秩提示适配,这是一种基于提示调优的方法,其性能与最先进的参数高效微调方法及全参数微调相当,同时具有更高的参数效率且无需基于服务器的适配器。该方法通过平衡跨实例共享任务特定信息与针对每个实例的定制化来生成软提示。为实现参数效率,它对每个实例编码的软提示组件进行低秩分解。我们在多种自然语言理解、代码生成与理解任务上,针对不同规模的基础模型进行了全面评估。