As pointed out by several scholars, current research on hate speech (HS) recognition is characterized by unsystematic data creation strategies and diverging annotation schemata. Subsequently, supervised-learning models tend to generalize poorly to datasets they were not trained on, and the performance of the models trained on datasets labeled using different HS taxonomies cannot be compared. To ease this problem, we propose applying extremely weak supervision that only relies on the class name rather than on class samples from the annotated data. We demonstrate the effectiveness of a state-of-the-art weakly-supervised text classification model in various in-dataset and cross-dataset settings. Furthermore, we conduct an in-depth quantitative and qualitative analysis of the source of poor generalizability of HS classification models.
翻译:正如多位学者所指出的,当前仇恨言论识别研究存在数据生成策略缺乏系统性和标注模式不一致的特点。由此,监督学习模型往往难以泛化至未训练的数据集,且基于不同仇恨言论分类体系标注数据训练的模型性能无法进行横向比较。为缓解这一问题,我们提出采用仅依赖类别名称而非标注数据中类别样本的极端弱监督方法。我们通过当前最先进的弱监督文本分类模型,在数据集内与跨数据集场景下验证了该方法有效性。此外,我们针对仇恨言论分类模型泛化性不足的根源展开了深入的定量与定性分析。