We introduce HATELEXICON, a lexicon of slurs and targets of hate speech for the countries of Brazil, Germany, India and Kenya, to aid training and interpretability of models. We demonstrate how our lexicon can be used to interpret model predictions, showing that models developed to classify extreme speech rely heavily on target words when making predictions. Further, we propose a method to aid shot selection for training in low-resource settings via HATELEXICON. In few-shot learning, the selection of shots is of paramount importance to model performance. In our work, we simulate a few-shot setting for German and Hindi, using HASOC data for training and the Multilingual HateCheck (MHC) as a benchmark. We show that selecting shots based on our lexicon leads to models performing better on MHC than models trained on shots sampled randomly. Thus, when given only a few training examples, using our lexicon to select shots containing more sociocultural information leads to better few-shot performance.
翻译:摘要:我们引入HATELEXICON——一个涵盖巴西、德国、印度和肯尼亚四国贬损语及仇恨言论靶标的词汇库,旨在辅助模型训练与可解释性分析。通过演示该词汇库如何用于解读模型预测结果,我们揭示了专门用于极端言论分类的模型在决策时高度依赖目标词汇的现象。进一步地,我们提出了一种基于HATELEXICON的低资源环境下训练样本选择方法。在小样本学习中,样本选择对模型性能至关重要。本研究利用HASOC数据集进行训练,以多语言仇恨检测基准(Multilingual HateCheck, MHC)为评估框架,模拟了德语与印地语的小样本场景。实验表明:基于词汇库选择的样本训练的模型,其MHC性能显著优于随机采样训练的模型。因此,在仅有少量训练样例的条件下,借助本词汇库选取蕴含更丰富社会文化信息的样本,可有效提升小样本学习效果。