Recent work has demonstrated that tuning continuous prompts on large, frozen pretrained language models (i.e., prefix tuning or P-tuning) can yield performance that is comparable or superior to fine-tuning. Nevertheless, the effectiveness of such methods under the context of data augmentation, which has been considered a common strategy to improve learning under low data regimes, has not be studied. In this paper, we examine several popular task-agnostic data augmentation techniques, i.e., EDA, Back Translation, and Mixup, when using prefix tuning under data scarcity. We show that data augmentation can be used to boost the performance of prefix tuning models, but the effectiveness of each technique varies and certain methods can lead to a notable degradation in performance, particularly when using larger models and on harder tasks. To help understand the above behaviour, we run experiments which reveal how prefix tuning generally presents a limited ability to separate the sentence embeddings from different classes of augmented data, and displays poorer performance on heavily altered data in particular. We also demonstrate that by adding a simple contrastive loss we can help mitigate such issues for prefix tuning, resulting in an improvement to augmented data performance.
翻译:近年来的研究表明,在大型冻结预训练语言模型上调整连续提示(即前缀微调或P-tuning)可取得与全参数微调相当甚至更优的性能。然而,在数据增强(这一被视为改善低数据场景学习的常用策略)的背景下,这类方法的有效性尚未得到充分研究。本文探讨了在数据稀缺条件下使用前缀微调时,几种流行的任务无关数据增强技术(即EDA、回译和Mixup)的表现。我们发现数据增强可提升前缀微调模型的性能,但各技术的有效性存在差异,某些方法(尤其是在使用更大模型和处理更困难任务时)可能导致性能显著下降。为理解上述现象,我们通过实验揭示:前缀微调通常难以有效分离不同类别增强数据的句子嵌入,尤其在处理大幅修改数据时表现更差。我们还证明,通过添加简单的对比损失函数可缓解前缀微调中的此类问题,从而提升增强数据的处理性能。