Crowdsourced annotation is vital to both collecting labelled data to train and test automated content moderation systems and to support human-in-the-loop review of system decisions. However, annotation tasks such as judging hate speech are subjective and thus highly sensitive to biases stemming from annotator beliefs, characteristics and demographics. We conduct two crowdsourcing studies on Mechanical Turk to examine annotator bias in labelling sexist and misogynistic hate speech. Results from 109 annotators show that annotator political inclination, moral integrity, personality traits, and sexist attitudes significantly impact annotation accuracy and the tendency to tag content as hate speech. In addition, semi-structured interviews with nine crowd workers provide further insights regarding the influence of subjectivity on annotations. In exploring how workers interpret a task - shaped by complex negotiations between platform structures, task instructions, subjective motivations, and external contextual factors - we see annotations not only impacted by worker factors but also simultaneously shaped by the structures under which they labour.
翻译:众包标注对于收集标记数据以训练和测试自动内容审核系统,以及支持系统决策的人机协同审核至关重要。然而,诸如仇恨言论判断等标注任务具有主观性,因此极易受到标注者信念、特征和人口统计学因素所引发的偏见影响。我们在Mechanical Turk平台上开展了两项众包研究,旨在检验标注者在标记性别歧视和厌女仇恨言论时的偏见。来自109位标注者的结果表明,标注者的政治倾向、道德操守、人格特质以及性别歧视态度显著影响标注准确性及将内容标记为仇恨言论的倾向。此外,对九名众包工作者的半结构化访谈进一步揭示了主观性对标注的影响。在探究工作者如何解读任务——这一过程受到平台结构、任务指令、主观动机及外部情境因素之间复杂博弈的影响——时,我们发现标注不仅受到工作者因素的影响,同时也在很大程度上被他们劳动所处的结构所塑造。