Online social media is rife with offensive and hateful comments, prompting the need for their automatic detection given the sheer amount of posts created every second. Creating high-quality human-labelled datasets for this task is difficult and costly, especially because non-offensive posts are significantly more frequent than offensive ones. However, unlabelled data is abundant, easier, and cheaper to obtain. In this scenario, self-training methods, using weakly-labelled examples to increase the amount of training data, can be employed. Recent "noisy" self-training approaches incorporate data augmentation techniques to ensure prediction consistency and increase robustness against noisy data and adversarial attacks. In this paper, we experiment with default and noisy self-training using three different textual data augmentation techniques across five different pre-trained BERT architectures varying in size. We evaluate our experiments on two offensive/hate-speech datasets and demonstrate that (i) self-training consistently improves performance regardless of model size, resulting in up to +1.5% F1-macro on both datasets, and (ii) noisy self-training with textual data augmentations, despite being successfully applied in similar settings, decreases performance on offensive and hate-speech domains when compared to the default method, even with state-of-the-art augmentations such as backtranslation.
翻译:在线社交媒体充斥着攻击性和仇恨性言论,鉴于每秒产生的海量帖子,亟需实现其自动检测。为此任务构建高质量的人工标注数据集既困难又成本高昂,尤其因为非攻击性帖子的数量远超攻击性帖子。然而,未标注数据丰富且更易获取且成本更低。在此场景下,可利用自训练方法通过弱标注样本扩充训练数据。近期“噪声”自训练方法整合数据增强技术,以确保预测一致性并增强对噪声数据及对抗攻击的鲁棒性。本文采用三种不同的文本数据增强技术,在五种不同规模的预训练BERT架构上实验了默认自训练与噪声自训练方法。我们在两个攻击性/仇恨言论数据集上评估实验,结果表明:(i)无论模型规模如何,自训练均能稳定提升性能,在两类数据集上F1宏观指标最高提升1.5%;(ii)即使在同等设置下成功应用的文本数据增强噪声自训练(包括反向翻译等最先进的增强方法),相较于默认方法,在攻击性与仇恨言论领域反而导致性能下降。