Accurate and scalable annotation of medical data is critical for the development of medical AI, but obtaining time for annotation from medical experts is challenging. Gamified crowdsourcing has demonstrated potential for obtaining highly accurate annotations for medical data at scale, and we demonstrate the same in this study for the segmentation of B-lines, an indicator of pulmonary congestion, on still frames within point-of-care lung ultrasound clips. We collected 21,154 annotations from 214 annotators over 2.5 days, and we demonstrated that the concordance of crowd consensus segmentations with reference standards exceeds that of individual experts with the same reference standards, both in terms of B-line count (mean squared error 0.239 vs. 0.308, p<0.05) as well as the spatial precision of B-line annotations (mean Dice-H score 0.755 vs. 0.643, p<0.05). These results suggest that expert-quality segmentations can be achieved using gamified crowdsourcing.
翻译:医学数据的准确且可扩展标注对医学人工智能的发展至关重要,但获取医学专家的标注时间极具挑战性。游戏化众包已展现出大规模获取医学数据高精度标注的潜力,本研究针对肺超声引导下静态图像中肺淤血指标——B线的分割任务验证了其可行性。我们在2.5天内收集了214名标注者的21,154份标注,结果表明:无论是B线计数(均方误差0.239 vs 0.308,p<0.05)还是B线标注的空间精确度(平均Dice-H分数0.755 vs 0.643,p<0.05),众包共识分割结果与参考标准的一致性均优于个体专家与同一参考标准的一致性。这些结果表明,通过游戏化众包可实现专家级分割质量。