We introduce HATELEXICON, a lexicon of slurs and targets of hate speech for the countries of Brazil, Germany, India and Kenya, to aid training and interpretability of models. We demonstrate how our lexicon can be used to interpret model predictions, showing that models developed to classify extreme speech rely heavily on target words when making predictions. Further, we propose a method to aid shot selection for training in low-resource settings via HATELEXICON. In few-shot learning, the selection of shots is of paramount importance to model performance. In our work, we simulate a few-shot setting for German and Hindi, using HASOC data for training and the Multilingual HateCheck (MHC) as a benchmark. We show that selecting shots based on our lexicon leads to models performing better on MHC than models trained on shots sampled randomly. Thus, when given only a few training examples, using our lexicon to select shots containing more sociocultural information leads to better few-shot performance.
翻译:我们推出了HATELEXICON——针对巴西、德国、印度和肯尼亚这四个国家的贬损语与仇恨言论目标词词典,旨在辅助模型训练与可解释性。我们展示了该词典在解释模型预测中的用途,揭示出为分类极端言论开发的模型在做出预测时高度依赖目标词。此外,我们提出了一种通过HATELEXICON在低资源环境下辅助训练样本选择的方法。在少样本学习中,样本选择对模型性能至关重要。本研究以德语和印地语为对象,利用HASOC数据进行训练,并以Multilingual HateCheck(MHC)作为基准,模拟了少样本场景。实验表明,基于词典选择样本训练的模型在MHC上的表现优于随机采样训练的模型。因此,在仅有少量训练样本的情况下,使用我们的词典选择包含更多社会文化信息的样本,能够提升少样本学习性能。