In this paper, we propose Ahead-of-Time (AoT) P-Tuning, a novel parameter-efficient fine-tuning method for pre-trained Language Models (LMs) that adds input-dependent bias before each Transformer layer. We evaluate AoT P-Tuning on GLUE and SuperGLUE benchmarking datasets using RoBERTa and DeBERTa models, showing that it outperforms BitFit and is comparable or better than other baseline methods for efficient fine-tuning. Additionally, we assess the inference overhead of AoT P-Tuning and demonstrate that it introduces negligible overhead compared to established baseline methods. Our method enables multi-task inference with a single backbone LM, making it a practical solution for real-world applications.
翻译:本文提出了一种名为“超前参数微调”(AoT P-Tuning)的新型参数高效微调方法,用于预训练语言模型(LMs)。该方法在每个Transformer层之前添加依赖输入信息的偏置。我们使用RoBERTa和DeBERTa模型在GLUE和SuperGLUE基准数据集上评估了AoT P-Tuning,结果表明其性能优于BitFit,并与其它高效微调基线方法相当或更优。此外,我们评估了AoT P-Tuning的推理开销,并证明相较于现有基线方法,其引入的额外开销可忽略不计。我们的方法支持在单骨干语言模型上进行多任务推理,为实际应用提供了一种可行的解决方案。