In the realm of medical imaging, leveraging large-scale datasets from various institutions is crucial for developing precise deep learning models, yet privacy concerns frequently impede data sharing. federated learning (FL) emerges as a prominent solution for preserving privacy while facilitating collaborative learning. However, its application in real-world scenarios faces several obstacles, such as task & data heterogeneity, label scarcity, non-identically distributed (non-IID) data, computational vaiation, etc. In real-world, medical institutions may not want to disclose their tasks to FL server and generalization challenge of out-of-network institutions with un-seen task want to join the on-going federated system. This study address task-agnostic and generalization problem on un-seen tasks by adapting self-supervised FL framework. Utilizing Vision Transformer (ViT) as consensus feature encoder for self-supervised pre-training, no initial labels required, the framework enabling effective representation learning across diverse datasets and tasks. Our extensive evaluations, using various real-world non-IID medical imaging datasets, validate our approach's efficacy, retaining 90\% of F1 accuracy with only 5\% of the training data typically required for centralized approaches and exhibiting superior adaptability to out-of-distribution task. The result indicate that federated learning architecture can be a potential approach toward multi-task foundation modeling.
翻译:在医学影像领域,利用来自不同机构的大规模数据集对于开发精确的深度学习模型至关重要,然而隐私问题常常阻碍数据共享。联邦学习(FL)作为一种在促进协作学习的同时保护隐私的突出解决方案应运而生。然而,其在现实场景中的应用面临若干障碍,例如任务与数据异质性、标签稀缺性、非独立同分布(non-IID)数据、计算资源差异等。现实中,医疗机构可能不希望向FL服务器披露其具体任务,且网络外机构(其任务在训练时未见过)希望加入正在运行的联邦系统时,会带来泛化挑战。本研究通过采用自监督联邦学习框架,解决了未见任务上的任务无关性与泛化问题。该框架利用Vision Transformer(ViT)作为共识特征编码器进行自监督预训练,无需初始标签,从而能够跨不同数据集和任务实现有效的表征学习。我们使用多种现实世界的非IID医学影像数据集进行了广泛评估,验证了所提方法的有效性:仅需集中式方法通常所需训练数据的5%,即可保持90%的F1准确率,并且对分布外任务展现出卓越的适应能力。结果表明,联邦学习架构有望成为实现多任务基础建模的一种潜在途径。