Benchmark datasets for digital dermatology unwittingly contain inaccuracies that reduce trust in model performance estimates. We propose a resource-efficient data cleaning protocol to identify issues that escaped previous curation. The protocol leverages an existing algorithmic cleaning strategy and is followed by a confirmation process terminated by an intuitive stopping criterion. Based on confirmation by multiple dermatologists, we remove irrelevant samples and near duplicates and estimate the percentage of label errors in six dermatology image datasets for model evaluation promoted by the International Skin Imaging Collaboration. Along with this paper, we publish revised file lists for each dataset which should be used for model evaluation. Our work paves the way for more trustworthy performance assessment in digital dermatology.
翻译:数字皮肤科领域的基准数据集无意中包含不准确性,这降低了对模型性能评估的信任。我们提出了一种资源高效的数据清洗协议,以识别先前数据整理中未发现的问题。该协议利用现有算法清洗策略,并通过直观的停止准则结束确认过程。基于多位皮肤科医生的确认,我们移除了不相关样本和近似重复数据,并估算了国际皮肤成像合作组织推广的六个用于模型评估的皮肤科图像数据集中标签错误的百分比。与本文一同,我们发布了各数据集修订后的文件列表,这些列表应被用于模型评估。我们的工作为数字皮肤科领域更可靠的性能评估奠定了基础。