Large-scale datasets for single-label multi-class classification, such as \emph{ImageNet-1k}, have been instrumental in advancing deep learning and computer vision. However, a critical and often understudied aspect is the comprehensive quality assessment of these datasets, especially regarding potential multi-label annotation errors. In this paper, we introduce a lightweight, user-friendly, and scalable framework that synergizes human and machine intelligence for efficient dataset validation and quality enhancement. We term this novel framework \emph{Multilabelfy}. Central to Multilabelfy is an adaptable web-based platform that systematically guides annotators through the re-evaluation process, effectively leveraging human-machine interactions to enhance dataset quality. By using Multilabelfy on the ImageNetV2 dataset, we found that approximately $47.88\%$ of the images contained at least two labels, underscoring the need for more rigorous assessments of such influential datasets. Furthermore, our analysis showed a negative correlation between the number of potential labels per image and model top-1 accuracy, illuminating a crucial factor in model evaluation and selection. Our open-source framework, Multilabelfy, offers a convenient, lightweight solution for dataset enhancement, emphasizing multi-label proportions. This study tackles major challenges in dataset integrity and provides key insights into model performance evaluation. Moreover, it underscores the advantages of integrating human expertise with machine capabilities to produce more robust models and trustworthy data development. The source code for Multilabelfy will be available at https://github.com/esla/Multilabelfy. \keywords{Computer Vision \and Dataset Quality Enhancement \and Dataset Validation \and Human-Computer Interaction \and Multi-label Annotation.}
翻译:大规模单标签多类分类数据集(如ImageNet-1k)在推动深度学习和计算机视觉发展方面发挥了重要作用。然而,一个关键但常被忽视的方面是这些数据集的全面质量评估,尤其是潜在的多标签标注错误。本文提出了一种轻量级、用户友好且可扩展的框架,该框架协同人类与机器智能,实现高效的数据集验证与质量增强。我们将这一新框架命名为Multilabelfy。Multilabelfy的核心是一个基于Web的可适应平台,该平台系统性地引导标注员完成重新评估过程,有效利用人机交互来提升数据集质量。通过在ImageNetV2数据集上应用Multilabelfy,我们发现约47.88%的图像包含至少两个标签,这凸显了对这类有影响力数据集进行更严格评估的必要性。此外,我们的分析表明,每张图像的潜在标签数量与模型top-1准确率呈负相关,揭示了模型评估与选择中的一个关键因素。我们开源的Multilabelfy框架提供了一种便捷、轻量级的数据集增强解决方案,重点关注多标签比例。本研究解决了数据集完整性的主要挑战,并为模型性能评估提供了关键见解。同时,它强调了将人类专业知识与机器能力相结合,以产生更鲁棒的模型和更可信的数据开发的优势。Multilabelfy的源代码将在https://github.com/esla/Multilabelfy上提供。