Safe artificial intelligence for perception tasks remains a major challenge, partly due to the lack of data with high-quality labels. Annotations themselves are subject to aleatoric and epistemic uncertainty, which is typically ignored during annotation and evaluation. While crowdsourcing enables collecting multiple annotations per image to estimate these uncertainties, this approach is impractical at scale due to the required annotation effort. We introduce a probabilistic label spreading method that provides reliable estimates of aleatoric and epistemic uncertainty of labels. Assuming label smoothness over the feature space, we propagate single annotations using a graph-based diffusion method. We prove that label spreading yields consistent probability estimators even when the number of annotations per data point converges to zero. We present and analyze a scalable implementation of our method. Experimental results indicate that, compared to baselines, our approach substantially reduces the annotation budget required to achieve a desired label quality on common image datasets and achieves a new state of the art on the Data-Centric Image Classification benchmark.
翻译:感知任务的安全人工智能仍面临重大挑战,部分原因在于缺乏高质量标注数据。标注过程本身同时受到偶然不确定性与认知不确定性的影响,而这些不确定性在标注和评估阶段通常被忽视。虽然众包技术能够为每张图像收集多个标注以估计这些不确定性,但由于所需标注工作量巨大,该方法难以实现规模化应用。本文提出一种概率标签传播方法,能够可靠估计标签的偶然不确定性与认知不确定性。基于特征空间中的标签平滑性假设,我们采用基于图的扩散方法对单次标注进行传播。我们证明,即使每个数据点的标注数量趋近于零,标签传播仍能产生一致的概率估计量。本文提出并分析了一种可扩展的方法实现方案。实验结果表明,在常见图像数据集上,与基线方法相比,本方法在达到目标标注质量时所需的标注预算显著降低,并在以数据为中心的图像分类基准测试中取得了新的最优性能。