Though majority vote among annotators is typically used for ground truth labels in natural language processing, annotator disagreement in tasks such as hate speech detection may reflect differences among group opinions, not noise. Thus, a crucial problem in hate speech detection is whether a statement is offensive to the demographic group that it targets, which may constitute a small fraction of the annotator pool. We construct a model that predicts individual annotator ratings on potentially offensive text and combines this information with the predicted target group of the text to model the opinions of target group members. We show gains across a range of metrics, including raising performance over the baseline by 22% at predicting individual annotators' ratings and 33% at predicting variance among annotators, which provides a method of measuring model uncertainty downstream. We find that annotators' ratings can be predicted using their demographic information and opinions on online content, without the need to track identifying annotator IDs that link each annotator to their ratings. We also find that use of non-invasive survey questions on annotators' online experiences helps to maximize privacy and minimize unnecessary collection of demographic information when predicting annotators' opinions.
翻译:摘要:尽管在自然语言处理中通常使用标注者之间的多数投票作为真实标签,但在仇恨言论检测等任务中,标注者的分歧可能反映群体意见的差异,而非噪声。因此,仇恨言论检测中的一个关键问题是:某一陈述是否冒犯了其针对的人口群体,而这一群体可能仅占标注者池中的一小部分。我们构建了一个模型,用于预测个体标注者对潜在攻击性文本的评分,并将此信息与文本的预测目标群体相结合,以建模目标群体成员的意见。我们展示了在一系列指标上的性能提升,包括将预测个体标注者评分的性能较基线提高22%,以及将预测标注者间方差的性能提高33%,这为下游任务中衡量模型不确定性提供了一种方法。我们发现,利用标注者的人口统计信息及其对在线内容的意见即可预测其评分,无需追踪将每个标注者与其评分关联的标识性标注者ID。同时,我们还发现,在预测标注者意见时,采用关于标注者线上体验的非侵入性调查问卷有助于最大化隐私保护,并最小化对人口统计信息的不必要收集。