Data augmentation is an effective technique for improving the performance of machine learning models. However, it has not been explored as extensively in natural language processing (NLP) as it has in computer vision. In this paper, we propose a novel text augmentation method that leverages the Fill-Mask feature of the transformer-based BERT model. Our method involves iteratively masking words in a sentence and replacing them with language model predictions. We have tested our proposed method on various NLP tasks and found it to be effective in many cases. Our results are presented along with a comparison to existing augmentation methods. Experimental results show that our proposed method significantly improves performance, especially on topic classification datasets.
翻译:数据增强是提升机器学习模型性能的有效技术,然而,该技术在自然语言处理领域的探索远不及计算机视觉领域广泛。本文提出一种新颖的文本增强方法,该方法利用基于Transformer的BERT模型的掩码填充功能。我们的方法通过迭代地对句子中的词进行掩码,并用语言模型预测的结果替换这些词来实现文本增强。我们在多种自然语言处理任务上测试了所提方法,发现其在多数场景下有效。实验结果与现有增强方法的对比表明,我们的方法显著提升了性能,尤其在主题分类数据集上效果突出。