SkinCAP: A Multi-modal Dermatology Dataset Annotated with Rich Medical Captions

With the widespread application of artificial intelligence (AI), particularly deep learning (DL) and vision-based large language models (VLLMs), in skin disease diagnosis, the need for interpretability becomes crucial. However, existing dermatology datasets are limited in their inclusion of concept-level meta-labels, and none offer rich medical descriptions in natural language. This deficiency impedes the advancement of LLM-based methods in dermatological diagnosis. To address this gap and provide a meticulously annotated dermatology dataset with comprehensive natural language descriptions, we introduce SkinCAP: a multi-modal dermatology dataset annotated with rich medical captions. SkinCAP comprises 4,000 images sourced from the Fitzpatrick 17k skin disease dataset and the Diverse Dermatology Images dataset, annotated by board-certified dermatologists to provide extensive medical descriptions and captions. Notably, SkinCAP represents the world's first such dataset and is publicly available at https://huggingface.co/datasets/joshuachou/SkinCAP.

翻译：随着人工智能（AI），特别是深度学习（DL）和基于视觉的大语言模型（VLLMs）在皮肤病诊断中的广泛应用，对可解释性的需求变得至关重要。然而，现有的皮肤病学数据集在包含概念级元标签方面存在局限，并且没有一个能提供丰富的自然语言医学描述。这一缺陷阻碍了基于LLM的方法在皮肤病学诊断中的进展。为了填补这一空白，并提供一个带有全面自然语言描述的、经过精细标注的皮肤病学数据集，我们引入了SkinCAP：一个带有丰富医学描述的多模态皮肤病学数据集。SkinCAP包含从Fitzpatrick 17k皮肤病数据集和Diverse Dermatology Images数据集中获取的4,000张图像，由经过委员会认证的皮肤科医生进行标注，以提供详尽的医学描述和说明。值得注意的是，SkinCAP是世界上首个此类数据集，并已在https://huggingface.co/datasets/joshuachou/SkinCAP公开提供。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日