Annotation of discourse relations is a known difficult task, especially for non-expert annotators. In this paper, we investigate novice annotators' uncertainty on the annotation of discourse relations on spoken conversational data. We find that dialogue context (single turn, pair of turns within speaker, and pair of turns across speakers) is a significant predictor of confidence scores. We compute distributed representations of discourse relations from co-occurrence statistics that incorporate information about confidence scores and dialogue context. We perform a hierarchical clustering analysis using these representations and show that weighting discourse relation representations with information about confidence and dialogue context coherently models our annotators' uncertainty about discourse relation labels.
翻译:话语关系标注是一项公认的困难任务,尤其对于非专业标注者而言。本文研究了新手标注者在口语对话数据上对话语关系标注的不确定性。我们发现对话语境(单轮次、说话者内部连续两轮次、说话者之间连续两轮次)是置信度分数的重要预测因子。我们根据共现统计信息计算话语关系的分布式表示,该表示融合了置信度分数与对话语境信息。通过使用这些表示进行层次聚类分析,我们证明:将置信度与对话语境信息加权融入话语关系表示后,能够一致地建模标注者对话语关系标签的不确定性。