The growing demand for accurate and equitable AI models in digital dermatology faces a significant challenge: the lack of diverse, high-quality labeled data. In this work, we investigate the potential of domain-specific foundation models for dermatology in addressing this challenge. We utilize self-supervised learning (SSL) techniques to pre-train models on a dataset of over 240,000 dermatological images from public and private collections. Our study considers several SSL methods and compares the resulting foundation models against domain-agnostic models like those pre-trained on ImageNet and state-of-the-art models such as MONET across 12 downstream tasks. Unlike previous research, we emphasize the development of smaller models that are more suitable for resource-limited clinical settings, facilitating easier adaptation to a broad range of use cases. Results show that models pre-trained in this work not only outperform general-purpose models but also approach the performance of models 50 times larger on clinically relevant diagnostic tasks. To promote further research in this direction, we publicly release both the training code and the foundation models, which can benefit clinicians in dermatological applications.
翻译:数字皮肤病学领域对精确且公平的AI模型的需求日益增长,但面临一个重大挑战:缺乏多样化、高质量的标注数据。在本研究中,我们探讨了皮肤病学领域专用基础模型在应对这一挑战方面的潜力。我们利用自监督学习技术,在一个包含来自公共和私人收藏的超过24万张皮肤病学图像的数据集上对模型进行预训练。我们的研究考虑了多种SSL方法,并将所得基础模型与在ImageNet上预训练的领域无关模型以及诸如MONET等最先进模型在12个下游任务上进行了比较。与先前研究不同,我们强调开发更适用于资源有限临床环境的较小模型,以促进其更轻松地适应广泛的应用场景。结果表明,本工作中预训练的模型不仅性能优于通用模型,而且在临床相关诊断任务上的表现接近比其大50倍的模型。为促进该方向的进一步研究,我们公开了训练代码和基础模型,这将有益于皮肤病学临床应用中的临床医生。