Toxic comment detection on social media has proven to be essential for content moderation. This paper compares a wide set of different models on a highly skewed multi-label hate speech dataset. We consider inference time and several metrics to measure performance and bias in our comparison. We show that all BERTs have similar performance regardless of the size, optimizations or language used to pre-train the models. RNNs are much faster at inference than any of the BERT. BiLSTM remains a good compromise between performance and inference time. RoBERTa with Focal Loss offers the best performance on biases and AUROC. However, DistilBERT combines both good AUROC and a low inference time. All models are affected by the bias of associating identities. BERT, RNN, and XLNet are less sensitive than the CNN and Compact Convolutional Transformers.
翻译:社交媒体上的有毒评论检测对于内容审核至关重要。本文在高度不平衡的多标签仇恨言论数据集上比较了多种不同模型。我们考虑了推理时间及多个指标来评估性能和偏差。研究表明,所有BERT模型在性能上表现相似,无论其大小、优化方式或预训练语言。RNN在推理速度上显著快于任何BERT模型。BiLSTM在性能与推理时间之间保持了良好平衡。结合焦点损失的RoBERTa在偏差和AUROC指标上表现最佳。然而,DistilBERT兼具良好的AUROC和较低的推理时间。所有模型均受到身份关联偏差的影响。BERT、RNN和XLNet对偏差的敏感度低于CNN和紧凑型卷积Transformer。