Foundation models, often pre-trained with large-scale data, have achieved paramount success in jump-starting various vision and language applications. Recent advances further enable adapting foundation models in downstream tasks efficiently using only a few training samples, e.g., in-context learning. Yet, the application of such learning paradigms in medical image analysis remains scarce due to the shortage of publicly accessible data and benchmarks. In this paper, we aim at approaches adapting the foundation models for medical image classification and present a novel dataset and benchmark for the evaluation, i.e., examining the overall performance of accommodating the large-scale foundation models downstream on a set of diverse real-world clinical tasks. We collect five sets of medical imaging data from multiple institutes targeting a variety of real-world clinical tasks (22,349 images in total), i.e., thoracic diseases screening in X-rays, pathological lesion tissue screening, lesion detection in endoscopy images, neonatal jaundice evaluation, and diabetic retinopathy grading. Results of multiple baseline methods are demonstrated using the proposed dataset from both accuracy and cost-effective perspectives.
翻译:基础模型通常通过大规模数据预训练,已在启动各类视觉与语言应用方面取得显著成功。近期进展进一步使得仅使用少量训练样本(如上下文学习)即可高效适应下游任务。然而,由于缺乏公开可获取的数据与基准,此类学习范式在医学图像分析中的应用仍十分有限。本文旨在探索适用于医学图像分类的基础模型适应方法,并提出用于评估的新型数据集与基准——即检验将大规模基础模型适应于一系列多样化真实临床任务的整体性能。我们从多家机构采集了五组医学影像数据,涵盖多种实际临床任务(总计22,349张图像),包括:X光胸肺疾病筛查、病理病变组织筛查、内镜图像病变检测、新生儿黄疸评估及糖尿病视网膜病变分级。基于所提数据集,本文从准确性与成本效益两个维度展示了多种基线方法的实验结果。