As pointed out by several scholars, current research on hate speech (HS) recognition is characterized by unsystematic data creation strategies and diverging annotation schemata. Subsequently, supervised-learning models tend to generalize poorly to datasets they were not trained on, and the performance of the models trained on datasets labeled using different HS taxonomies cannot be compared. To ease this problem, we propose applying extremely weak supervision that only relies on the class name rather than on class samples from the annotated data. We demonstrate the effectiveness of a state-of-the-art weakly-supervised text classification model in various in-dataset and cross-dataset settings. Furthermore, we conduct an in-depth quantitative and qualitative analysis of the source of poor generalizability of HS classification models.
翻译:正如多位学者所指出的,当前仇恨言论识别研究存在数据构建策略不系统与标注框架不一致的问题。这导致监督学习模型在未经训练的数据集上泛化能力较差,且基于不同仇恨言论分类体系标注的数据集训练的模型性能无法直接比较。为缓解这一问题,我们提出应用仅依赖类别名称而非标注数据中类别样本的极弱监督方法。我们展示了当前最先进的弱监督文本分类模型在多种数据集内及跨数据集场景下的有效性。此外,我们对仇恨言论分类模型泛化能力不足的根源进行了深入的定量与定性分析。