Addressing the challenges of rare diseases is difficult, especially with the limited number of reference images and a small patient population. This is more evident in rare skin diseases, where we encounter long-tailed data distributions that make it difficult to develop unbiased and broadly effective models. The diverse ways in which image datasets are gathered and their distinct purposes also add to these challenges. Our study conducts a detailed examination of the benefits and drawbacks of episodic and conventional training methodologies, adopting a few-shot learning approach alongside transfer learning. We evaluated our models using the ISIC2018, Derm7pt, and SD-198 datasets. With minimal labeled examples, our models showed substantial information gains and better performance compared to previously trained models. Our research emphasizes the improved ability to represent features in DenseNet121 and MobileNetV2 models, achieved by using pre-trained models on ImageNet to increase similarities within classes. Moreover, our experiments, ranging from 2-way to 5-way classifications with up to 10 examples, showed a growing success rate for traditional transfer learning methods as the number of examples increased. The addition of data augmentation techniques significantly improved our transfer learning based model performance, leading to higher performances than existing methods, especially in the SD-198 and ISIC2018 datasets. All source code related to this work will be made publicly available soon at the provided URL.
翻译:由于罕见疾病样本数量有限且患病人群较少,解决其诊断难题颇具挑战性。这一问题在罕见皮肤病中尤为突出——长尾数据分布导致难以构建无偏且广泛有效的模型。图像数据集采集方式的多样化及其特定用途进一步加剧了这些困难。本研究详细比较了小样本学习中情节式训练与传统训练方法的优劣势,并结合迁移学习策略展开系统分析。我们采用ISIC2018、Derm7pt和SD-198三个数据集进行模型评估。实验表明,在标注样本极少的条件下,本模型的信息增益显著优于既往训练模型。研究重点揭示了通过ImageNet预训练模型增强类间相似性后,DenseNet121与MobileNetV2模型特征表征能力的提升。在2分类至5分类(最多含10个样本)的系列实验中,传统迁移学习方法的成功概率随样本量增加呈递增趋势。数据增强技术的引入显著改善了基于迁移学习模型的性能,尤其在SD-198与ISIC2018数据集上,本方法性能超越现有模型。本研究的全部源代码将在指定URL公开。