SkinCon: A skin disease dataset densely annotated by domain experts for fine-grained model debugging and analysis

For the deployment of artificial intelligence (AI) in high-risk settings, such as healthcare, methods that provide interpretability/explainability or allow fine-grained error analysis are critical. Many recent methods for interpretability/explainability and fine-grained error analysis use concepts, which are meta-labels that are semantically meaningful to humans. However, there are only a few datasets that include concept-level meta-labels and most of these meta-labels are relevant for natural images that do not require domain expertise. Densely annotated datasets in medicine focused on meta-labels that are relevant to a single disease such as melanoma. In dermatology, skin disease is described using an established clinical lexicon that allows clinicians to describe physical exam findings to one another. To provide a medical dataset densely annotated by domain experts with annotations useful across multiple disease processes, we developed SkinCon: a skin disease dataset densely annotated by dermatologists. SkinCon includes 3230 images from the Fitzpatrick 17k dataset densely annotated with 48 clinical concepts, 22 of which have at least 50 images representing the concept. The concepts used were chosen by two dermatologists considering the clinical descriptor terms used to describe skin lesions. Examples include "plaque", "scale", and "erosion". The same concepts were also used to label 656 skin disease images from the Diverse Dermatology Images dataset, providing an additional external dataset with diverse skin tone representations. We review the potential applications for the SkinCon dataset, such as probing models, concept-based explanations, and concept bottlenecks. Furthermore, we use SkinCon to demonstrate two of these use cases: debugging mistakes of an existing dermatology AI model with concepts and developing interpretable models with post-hoc concept bottleneck models.

翻译：在医疗等高危场景中部署人工智能时，可提供可解释性/可说明性或支持精细化错误分析的方法至关重要。近年来许多可解释性/可说明性和精细化错误分析方法采用"概念"作为对人类具有语义意义的元标签。然而，目前仅有少量数据集包含概念级元标签，且多数此类元标签适用于无需领域专业知识的自然图像。医学领域的密集标注数据集集中于与单一疾病（如黑色素瘤）相关的元标签。在皮肤病学中，皮肤病变通过一套成熟的临床词典进行描述，使临床医生能够相互交流体格检查结果。为提供由领域专家密集标注且适用于多种疾病过程的医学数据集，我们开发了SkinCon：一个由皮肤科医生密集标注的皮肤病数据集。SkinCon包含来自Fitzpatrick 17k数据集的3230张图像，密集标注了48个临床概念，其中22个概念至少有50张图像作为代表。这些概念由两位皮肤科医生根据描述皮肤病变的临床术语选定，例如"斑块""鳞屑""糜烂"。相同的概念还被用于标注Diverse Dermatology Images数据集的656张皮肤病图像，从而提供了额外包含多样化肤色表征的外部数据集。我们综述了SkinCon数据集的潜在应用场景，包括模型探测、基于概念的解释和概念瓶颈模型。此外，我们利用SkinCon演示了其中两个用例：通过概念调试现有皮肤科AI模型的错误，以及利用事后概念瓶颈模型开发可解释模型。