Efficient finetuning of pretrained language transformers is becoming increasingly prevalent for solving natural language processing tasks. While effective, it can still require a large number of tunable parameters. This can be a drawback for low-resource applications and training with differential-privacy constraints, where excessive noise may be introduced during finetuning. To this end, we propose a novel language transformer finetuning strategy that introduces task-specific parameters in multiple transformer layers. These parameters are derived from fixed random projections of a single trainable vector, enabling finetuning with significantly fewer parameters while maintaining performance. We achieve within 5% of full finetuning performance on GLUE tasks with as few as 4,100 parameters per task, outperforming other parameter-efficient finetuning approaches that use a similar number of per-task parameters. Besides, the random projections can be precomputed at inference, avoiding additional computational latency. All these make our method particularly appealing for low-resource applications. Finally, our method achieves the best or comparable utility compared to several recent finetuning methods when training with the same privacy constraints, underscoring its effectiveness and potential real-world impact.
翻译:预训练语言Transformer的高效微调在解决自然语言处理任务中日益普及。尽管效果显著,该方法仍可能需要大量可调参数。这在低资源应用及差分隐私约束训练场景中构成缺陷——微调过程中可能引入过多噪声。为此,我们提出一种新型语言Transformer微调策略,在多个Transformer层中引入任务特定参数。这些参数由单一可训练向量的固定随机投影推导得出,使得在维持性能的同时用更少参数完成微调。我们在GLUE任务上以每任务仅4,100个参数实现了全参数微调性能的5%以内差距,优于采用相近每任务参数量的其他参数高效微调方法。此外,随机投影可在推理阶段预计算,避免额外计算延迟。这些特性使该方法在低资源应用中尤其具有吸引力。最终,在相同隐私约束下进行训练时,我们的方法与若干近期微调方法相比实现了最佳或相当的效果,彰显其有效性与潜在的实际应用价值。