Rule-based text data augmentation is widely used for NLP tasks due to its simplicity. However, this method can potentially damage the original meaning of the text, ultimately hurting the performance of the model. To overcome this limitation, we propose a straightforward technique for applying soft labels to augmented data. We conducted experiments across seven different classification tasks and empirically demonstrated the effectiveness of our proposed approach. We have publicly opened our source code for reproducibility.
翻译:基于规则的文本数据增强因其简洁性而被广泛应用于自然语言处理任务。然而,该方法可能破坏文本的原始语义,最终损害模型性能。为解决这一局限,我们提出了一种对增强数据应用软标签的简易技术。我们在七个不同的分类任务上进行了实验,实证证明了所提方法的有效性。为促进可复现性,我们已公开了源代码。