The advancement of machine learning algorithms in medical image analysis requires the expansion of training datasets. A popular and cost-effective approach is automated annotation extraction from free-text medical reports, primarily due to the high costs associated with expert clinicians annotating medical images, such as chest X-rays. However, it has been shown that the resulting datasets are susceptible to biases and shortcuts. Another strategy to increase the size of a dataset is crowdsourcing, a widely adopted practice in general computer vision with some success in medical image analysis. In a similar vein to crowdsourcing, we enhance two publicly available chest X-ray datasets by incorporating non-expert annotations. However, instead of using diagnostic labels, we annotate shortcuts in the form of tubes. We collect 3.5k chest drain annotations for NIH-CXR14, and 1k annotations for four different tube types in PadChest, and create the Non-Expert Annotations of Tubes in X-rays (NEATX) dataset. We train a chest drain detector with the non-expert annotations that generalizes well to expert labels. Moreover, we compare our annotations to those provided by experts and show "moderate" to "almost perfect" agreement. Finally, we present a pathology agreement study to raise awareness about the quality of ground truth annotations. We make our dataset available at https://zenodo.org/records/14944064 and our code available at https://github.com/purrlab/chestxr-label-reliability.
翻译:医学影像分析中机器学习算法的进步需要扩大训练数据集。一种流行且经济高效的方法是从自由文本医疗报告中自动提取标注,这主要源于专家临床医生标注医学影像(如胸部X光)的高昂成本。然而,研究表明由此产生的数据集容易存在偏差和捷径。另一种增加数据集规模的策略是众包,这是在通用计算机视觉领域广泛采用并在医学影像分析中取得一定成功的实践。与众包思路类似,我们通过纳入非专家标注来增强两个公开可用的胸部X光数据集。但我们不使用诊断标签,而是以导管形式标注捷径。我们为NIH-CXR14收集了3.5k个胸腔引流管标注,为PadChest中四种不同类型的导管收集了1k个标注,并创建了X光导管非专家标注数据集。利用这些非专家标注,我们训练了一个胸腔引流管检测器,该检测器能良好泛化至专家标注。此外,我们将非专家标注与专家标注进行比较,结果显示两者具有"中等"至"几乎完全一致"的一致性。最后,我们通过病理一致性研究来提升对真实标注质量的关注度。我们的数据集发布于https://zenodo.org/records/14944064,代码发布于https://github.com/purrlab/chestxr-label-reliability。