This paper investigates the effectiveness of token-level text augmentation and the role of probabilistic linguistic knowledge within a linguistically-motivated evaluation context. Two text augmentation programs, REDA and REDA$_{NG}$, were developed, both implementing five token-level text editing operations: Synonym Replacement (SR), Random Swap (RS), Random Insertion (RI), Random Deletion (RD), and Random Mix (RM). REDA$_{NG}$ leverages pretrained $n$-gram language models to select the most likely augmented texts from REDA's output. Comprehensive and fine-grained experiments were conducted on a binary question matching classification task in both Chinese and English. The results strongly refute the general effectiveness of the five token-level text augmentation techniques under investigation, whether applied together or separately, and irrespective of various common classification model types used, including transformers. Furthermore, the role of probabilistic linguistic knowledge is found to be minimal.
翻译:本文在语言学驱动的评估框架下,系统研究了词元级文本增强的有效性以及概率语言知识的作用。我们开发了两种文本增强程序REDA和REDA$_{NG}$,两者均实现了五种词元级文本编辑操作:同义词替换(SR)、随机交换(RS)、随机插入(RI)、随机删除(RD)和随机混合(RM)。其中,REDA$_{NG}$利用预训练的$n$-gram语言模型,从REDA的输出中选择最可能的增强文本。我们在中英文二值问题匹配分类任务上开展了全面且细粒度的实验。实验结果明确否定了所研究的五种词元级文本增强技术(无论是单独使用还是联合使用,亦或是在包括Transformer在内的各类常见分类模型类型下)的普遍有效性。此外,概率语言知识的作用被证实微乎其微。