Text classification tasks often encounter few shot scenarios with limited labeled data, and addressing data scarcity is crucial. Data augmentation with mixup has shown to be effective on various text classification tasks. However, most of the mixup methods do not consider the varying degree of learning difficulty in different stages of training and generate new samples with one hot labels, resulting in the model over confidence. In this paper, we propose a self evolution learning (SE) based mixup approach for data augmentation in text classification, which can generate more adaptive and model friendly pesudo samples for the model training. SE focuses on the variation of the model's learning ability. To alleviate the model confidence, we introduce a novel instance specific label smoothing approach, which linearly interpolates the model's output and one hot labels of the original samples to generate new soft for label mixing up. Through experimental analysis, in addition to improving classification accuracy, we demonstrate that SE also enhances the model's generalize ability.
翻译:文本分类任务常面临标注数据有限的少样本场景,解决数据稀缺性问题至关重要。基于混合(mixup)的数据增强方法已在各类文本分类任务中展现出有效性。然而,大多数混合方法未考虑模型在不同训练阶段学习难度的动态变化,且采用独热标签生成新样本,导致模型过度自信。本文提出一种基于自我进化学习(SE)的混合数据增强方法,能够为模型训练生成更具适应性和模型友好性的伪样本。SE方法聚焦于模型学习能力的动态变化。为缓解模型置信度问题,我们提出一种新颖的实例级标签平滑方法,通过对模型输出与原始样本独热标签进行线性插值,生成新混合软标签。实验分析表明,SE方法不仅能提升分类准确率,还能增强模型的泛化能力。