Mixup is an effective data augmentation method that generates new augmented samples by aggregating linear combinations of different original samples. However, if there are noises or aberrant features in the original samples, Mixup may propagate them to the augmented samples, leading to over-sensitivity of the model to these outliers . To solve this problem, this paper proposes a new Mixup method called AMPLIFY. This method uses the Attention mechanism of Transformer itself to reduce the influence of noises and aberrant values in the original samples on the prediction results, without increasing additional trainable parameters, and the computational cost is very low, thereby avoiding the problem of high resource consumption in common Mixup methods such as Sentence Mixup . The experimental results show that, under a smaller computational resource cost, AMPLIFY outperforms other Mixup methods in text classification tasks on 7 benchmark datasets, providing new ideas and new ways to further improve the performance of pre-trained models based on the Attention mechanism, such as BERT, ALBERT, RoBERTa, and GPT. Our code can be obtained at https://github.com/kiwi-lilo/AMPLIFY.
翻译:混合(Mixup)是一种有效的数据增强方法,通过聚合不同原始样本的线性组合生成新的增强样本。然而,若原始样本中存在噪声或异常特征,混合方法可能将这些特征传播至增强样本,导致模型对这些异常值过度敏感。为解决此问题,本文提出一种名为AMPLIFY的新型混合方法。该方法利用Transformer自身的注意力机制,在不增加额外可训练参数的前提下降低原始样本中噪声和异常值对预测结果的影响,且计算成本极低,从而避免了如句子混合(Sentence Mixup)等常见混合方法中资源消耗高的问题。实验结果表明,在更小的计算资源成本下,AMPLIFY在7个基准数据集的文本分类任务中优于其他混合方法,为基于注意力机制的预训练模型(如BERT、ALBERT、RoBERTa和GPT)的进一步性能提升提供了新思路与新途径。我们的代码可在https://github.com/kiwi-lilo/AMPLIFY获取。