Pre-trained language models (PLMs) show impressive performance in various downstream NLP tasks. However, pre-training large language models demands substantial memory and training compute. Furthermore, due to the substantial resources required, many PLM weights are confidential. Consequently, users are compelled to share their data with model owners for fine-tuning specific tasks. To overcome the limitations, we introduce Plug-in External Memory Adaptation (PEMA), a Parameter-Efficient Fine-Tuning (PEFT) method, enabling PLM fine-tuning without requiring access to all the weights. PEMA integrates with context representations from test data during inference to perform downstream tasks. It uses external memory to store PLM-generated context representations mapped with target tokens. Our method utilizes weight matrices of LoRA-like bottlenecked adapter in the PLM's final layer to enhance efficiency. Our approach also includes Gradual Unrolling, a novel interpolation strategy to improve generation quality. We validate PEMA's effectiveness through experiments on syntactic and real datasets for machine translation and style transfer. Our findings show that PEMA outperforms other PEFT approaches in memory and latency efficiency for training, and also excels in maintaining sentence meaning and generating appropriate language and styles.
翻译:预训练语言模型(PLMs)在各类下游自然语言处理任务中展现出卓越性能。然而,大规模语言模型的预训练需要大量内存与训练算力。此外,由于资源需求庞大,许多PLM的权重具有保密性。这迫使使用者不得不将自身数据共享给模型所有者以进行特定任务微调。为突破上述局限,我们提出即插式外部记忆适配方法(PEMA)——一种参数高效微调(PEFT)方法,能够在无需访问全部权重的情况下实现PLM微调。PEMA在推理阶段通过整合测试数据的上下文表征来执行下游任务,并利用外部记忆存储经目标词元映射的PLM生成上下文表征。该方法在PLM末层采用类似LoRA的瓶颈适配器权重矩阵提升效率,同时引入新型插值策略——渐进式展开,以改善生成质量。我们基于句法数据集与真实数据集,对机器翻译与风格迁移任务进行实验验证。结果表明,PEMA在训练阶段的内存效率与延迟效率方面优于其他PEFT方法,同时在保持语句语义一致性、生成恰当语言风格与形态方面表现突出。