As pointed out by several scholars, current research on hate speech (HS) recognition is characterized by unsystematic data creation strategies and diverging annotation schemata. Subsequently, supervised-learning models tend to generalize poorly to datasets they were not trained on, and the performance of the models trained on datasets labeled using different HS taxonomies cannot be compared. To ease this problem, we propose applying extremely weak supervision that only relies on the class name rather than on class samples from the annotated data. We demonstrate the effectiveness of a state-of-the-art weakly-supervised text classification model in various in-dataset and cross-dataset settings. Furthermore, we conduct an in-depth quantitative and qualitative analysis of the source of poor generalizability of HS classification models.
翻译:正如多位学者所指出的,当前仇恨言论识别研究存在数据创建策略缺乏系统性和标注模式差异显著的问题。因此,监督学习模型往往难以泛化至其未训练过的数据集,且基于不同仇恨言论分类法标注的数据集所训练的模型性能也无法进行横向对比。为缓解这一问题,我们提出了一种仅依赖类别名称而非标注数据中类别样本的极弱监督方法。我们通过最先进的弱监督文本分类模型,在多种数据集内及跨数据集场景下验证了其有效性。此外,我们对仇恨言论分类模型泛化能力不足的根源开展了深入的定量与定性分析。