Data augmentation has shown its effectiveness in resolving the data-hungry problem and improving model's generalization ability. However, the quality of augmented data can be varied, especially compared with the raw/original data. To boost deep learning models' performance given augmented data/samples in text classification tasks, we propose a novel framework, which leverages both meta learning and contrastive learning techniques as parts of our design for reweighting the augmented samples and refining their feature representations based on their quality. As part of the framework, we propose novel weight-dependent enqueue and dequeue algorithms to utilize augmented samples' weight/quality information effectively. Through experiments, we show that our framework can reasonably cooperate with existing deep learning models (e.g., RoBERTa-base and Text-CNN) and augmentation techniques (e.g., Wordnet and Easydata) for specific supervised learning tasks. Experiment results show that our framework achieves an average of 1.6%, up to 4.3% absolute improvement on Text-CNN encoders and an average of 1.4%, up to 4.4% absolute improvement on RoBERTa-base encoders on seven GLUE benchmark datasets compared with the best baseline. We present an indepth analysis of our framework design, revealing the non-trivial contributions of our network components. Our code is publicly available for better reproducibility.
翻译:数据增强已被证明能有效缓解数据稀缺问题并提升模型的泛化能力。然而,增强数据的质量可能存在波动,尤其与原始数据相比。为在文本分类任务中利用增强数据提升深度学习模型的性能,本文提出一种新颖框架,该框架结合元学习与对比学习技术,根据增强样本的质量对其进行加权并优化其特征表示。作为框架的核心组成部分,我们提出了新颖的权重相关入队与出队算法,以有效利用增强样本的权重/质量信息。实验表明,本框架能够与现有深度学习模型(如RoBERTa-base和Text-CNN)及增强技术(如Wordnet和Easydata)有效协同,完成特定监督学习任务。在七个GLUE基准数据集上的实验结果显示:相较于最佳基线方法,本框架在Text-CNN编码器上实现了平均1.6%、最高4.3%的绝对性能提升,在RoBERTa-base编码器上实现了平均1.4%、最高4.4%的绝对提升。我们通过深入分析框架设计,揭示了各网络组件的关键贡献。代码已公开以确保可复现性。