Annotators exhibit disagreement during data labeling, which can be termed as annotator label uncertainty. Annotator label uncertainty manifests in variations of labeling quality. Training with a single low-quality annotation per sample induces model reliability degradations. In this work, we first examine the effects of annotator label uncertainty in terms of the model's generalizability and prediction uncertainty. We observe that the model's generalizability and prediction uncertainty degrade with the presence of low-quality noisy labels. Meanwhile, our evaluation of existing uncertainty estimation algorithms indicates their incapability in response to annotator label uncertainty. To mitigate performance degradation, prior methods show that training models with labels collected from multiple independent annotators can enhance generalizability. However, they require massive annotations. Hence, we introduce a novel perceptual quality-based model training framework to objectively generate multiple labels for model training to enhance reliability, while avoiding massive annotations. Specifically, we first select a subset of samples with low perceptual quality scores ranked by statistical regularities of visual signals. We then assign de-aggregated labels to each sample in this subset to obtain a training set with multiple labels. Our experiments and analysis demonstrate that training with the proposed framework alleviates the degradation of generalizability and prediction uncertainty caused by annotator label uncertainty.
翻译:标注者在数据标注过程中存在分歧,这被称为标注者标签不确定性。这种不确定性表现为标注质量的差异。使用单个低质量标注训练样本会导致模型可靠性下降。本研究首先从模型泛化性和预测不确定性角度探讨了标注者标签不确定性的影响。我们观察到,低质量噪声标签的存在会降低模型的泛化能力和预测可靠性。同时,对现有不确定性估计算法的评估表明,它们无法有效应对标注者标签不确定性。为缓解性能退化,已有方法表明通过多个独立标注者的标签训练模型可提升泛化性,但需要大量标注数据。为此,我们提出了一种基于感知质量的模型训练新框架,通过客观生成多重标签进行模型训练以增强可靠性,同时避免大规模人工标注。具体而言,首先根据视觉信号的统计规律筛选出感知质量评分较低的子集样本,然后为该子集中的每个样本分配非聚合标签,从而获得包含多重标签的训练集。实验与分析表明,采用该框架训练可有效缓解由标注者标签不确定性导致的泛化性和预测可靠性退化问题。