This paper does not describe a novel method. Instead, it studies an essential foundation for reliable benchmarking and ultimately real-world application of AI-based image analysis: generating high-quality reference annotations. Previous research has focused on crowdsourcing as a means of outsourcing annotations. However, little attention has so far been given to annotation companies, specifically regarding their internal quality assurance (QA) processes. Therefore, our aim is to evaluate the influence of QA employed by annotation companies on annotation quality and devise methodologies for maximizing data annotation efficacy. Based on a total of 57,648 instance segmented images obtained from a total of 924 annotators and 34 QA workers from four annotation companies and Amazon Mechanical Turk (MTurk), we derived the following insights: (1) Annotation companies perform better both in terms of quantity and quality compared to the widely used platform MTurk. (2) Annotation companies' internal QA only provides marginal improvements, if any. However, improving labeling instructions instead of investing in QA can substantially boost annotation performance. (3) The benefit of internal QA depends on specific image characteristics. Our work could enable researchers to derive substantially more value from a fixed annotation budget and change the way annotation companies conduct internal QA.
翻译:本文并未提出新颖方法,而是研究基于人工智能的图像分析实现可靠基准测试乃至实际应用的关键基础:生成高质量参考标注。先前研究主要关注将众包作为标注外包手段,然而迄今为止,对专业标注公司(特别是其内部质量保证流程)的关注甚少。因此,本研究旨在评估标注公司采用的质量保证机制对标注质量的影响,并设计最大化数据标注效能的方法论。基于从四家标注公司及亚马逊Mechanical Turk(MTurk)平台收集的57,648张实例分割图像数据(涉及924名标注员与34名质量保证专员),我们得出以下结论:(1)与广泛使用的MTurk平台相比,标注公司在标注数量与质量方面均表现更优。(2)标注公司的内部质量保证仅能带来有限改进(若存在改进),而优化标注说明指南相较于投资质量保证能显著提升标注性能。(3)内部质量保证的效益取决于特定图像特征。本研究可帮助研究人员在固定标注预算下获取显著更高的价值,并可能改变标注公司实施内部质量保证的方式。