Disagreement in natural language annotation has mostly been studied from a perspective of biases introduced by the annotators and the annotation frameworks. Here, we propose to analyze another source of bias: task design bias, which has a particularly strong impact on crowdsourced linguistic annotations where natural language is used to elicit the interpretation of laymen annotators. For this purpose we look at implicit discourse relation annotation, a task that has repeatedly been shown to be difficult due to the relations' ambiguity. We compare the annotations of 1,200 discourse relations obtained using two distinct annotation tasks and quantify the biases of both methods across four different domains. Both methods are natural language annotation tasks designed for crowdsourcing. We show that the task design can push annotators towards certain relations and that some discourse relations senses can be better elicited with one or the other annotation approach. We also conclude that this type of bias should be taken into account when training and testing models.
翻译:自然语言标注中的分歧大多从标注者和标注框架引入偏差的角度进行研究。本文提出分析另一种偏差来源:任务设计偏差,该偏差对使用自然语言来引导非专业标注者解读的众包语言标注影响尤为显著。为此,我们聚焦于隐式话语关系标注这一因关系歧义性而屡被证实具有难度的任务,比较了通过两种不同标注任务获得的1200条话语关系标注结果,并量化了两种方法在四个不同领域中的偏差。两种方法均为面向众包设计的自然语言标注任务。研究表明,任务设计会促使标注者倾向于某些特定关系,且某些话语关系义项可通过某一标注方法更有效地激发。我们还认为,在训练和测试模型时应当考虑此类偏差。