Recent advancements in pre-trained large foundation models (LFM) have yielded significant breakthroughs across various domains, including natural language processing and computer vision. These models have been particularly impactful in the domain of medical diagnostic tasks. With abundant unlabeled data, an LFM has been developed for fundus images using the Vision Transformer (VIT) and a self-supervised learning framework. This LFM has shown promising performance in fundus disease diagnosis across multiple datasets. On the other hand, deep learning models have long been challenged by dataset quality issues, such as image quality and dataset bias. To investigate the influence of data quality on LFM, we conducted explorations in two fundus diagnosis tasks using datasets of varying quality. Specifically, we explored the following questions: Is LFM more robust to image quality? Is LFM affected by dataset bias? Can fine-tuning techniques alleviate these effects? Our investigation found that LFM exhibits greater resilience to dataset quality issues, including image quality and dataset bias, compared to typical convolutional networks. Furthermore, we discovered that overall fine-tuning is an effective adapter for LFM to mitigate the impact of dataset quality issues.
翻译:近期,预训练大型基础模型(LFM)在自然语言处理和计算机视觉等多个领域取得了显著突破。这些模型在医学诊断任务中尤其具有影响力。基于丰富的无标签数据,研究人员利用Vision Transformer(VIT)和自监督学习框架开发了一种用于眼底图像的LFM。该LFM在多个数据集上的眼底疾病诊断中表现出良好的性能。另一方面,深度学习模型长期以来一直受到数据集质量问题的挑战,如图像质量和数据集偏差。为了探究数据质量对LFM的影响,我们使用不同质量的数据集,在两项眼底诊断任务中开展了探索。具体而言,我们研究了以下问题:LFM是否对图像质量更具鲁棒性?LFM是否受数据集偏差影响?微调技术能否缓解这些影响?我们的研究发现,与典型的卷积网络相比,LFM对数据集质量问题(包括图像质量和数据集偏差)表现出更强的鲁棒性。此外,我们发现全局微调是LFM缓解数据集质量问题影响的有效适配器。