Parameter-Efficient Fine-Tuning (PEFT) methods have become crucial for rapidly adapting large language models (LLMs) to downstream tasks. Prefix-Tuning, an early and effective PEFT technique, demonstrated the ability to achieve performance comparable to full fine-tuning with significantly reduced computational and memory overhead. However, despite its earlier success, its effectiveness in training modern state-of-the-art LLMs has been very limited. In this work, we demonstrate empirically that prefix-tuning underperforms on LLMs because of an inherent tradeoff between the contribution of the input prompt and the parameterized prefix within the attention head. This motivates us to introduce PrefixMemory-Tuning, an architecture that generalizes the principles of prefix-tuning while addressing its shortcomings by shifting the prefix module out of the attention head itself and improving its expressiveness. Our experiments show that, across diverse benchmarks, PrefixMemory-Tuning consistently outperforms existing prefix-tuning methods. Notably, it achieves competitive performance with modern PEFTs on several general benchmarks, highlighting a potential extension of prefix-tuning approaches to become state-of-the-art. Our findings suggest that by overcoming its inherent limitations, prefix-tuning can remain a competitive and relevant research direction in the landscape of parameter-efficient LLM adaptation.
翻译:参数高效微调(PEFT)方法已成为快速将大型语言模型(LLM)适应下游任务的关键技术。前缀调节(Prefix-Tuning)作为一种早期且有效的PEFT技术,曾展现出以显著降低的计算和内存开销实现与全微调相当性能的能力。然而,尽管其早期成功,该技术在现代最先进的LLM训练中的有效性非常有限。本研究通过实验证明,前缀调节在LLM上性能不佳的原因在于输入提示与参数化前缀在注意力头内部存在固有的贡献权衡。这促使我们提出前缀记忆调节(PrefixMemory-Tuning)——一种通过将前缀模块移出注意力头自身并增强其表达能力来泛化前缀调节原理并解决其缺陷的架构。实验表明,在多种基准测试中,前缀记忆调节始终优于现有前缀调节方法。值得注意的是,它在多个通用基准上取得了与现代PEFT相竞争的性能,凸显了前缀调节方法向成为最先进技术扩展的潜力。我们的研究结果表明,通过克服其固有局限,前缀调节仍可在参数高效LLM适配领域中保持竞争力和相关性。