基于自适应焦点损失的联邦视觉Transformer在医学图像分类中的应用 (Federated Vision Transformer with Adaptive Focal Loss for Medical Image Classification)

While deep learning models like Vision Transformer (ViT) have achieved significant advances, they typically require large datasets. With data privacy regulations, access to many original datasets is restricted, especially medical images. Federated learning (FL) addresses this challenge by enabling global model aggregation without data exchange. However, the heterogeneity of the data and the class imbalance that exist in local clients pose challenges for the generalization of the model. This study proposes a FL framework leveraging a dynamic adaptive focal loss (DAFL) and a client-aware aggregation strategy for local training. Specifically, we design a dynamic class imbalance coefficient that adjusts based on each client's sample distribution and class data distribution, ensuring minority classes receive sufficient attention and preventing sparse data from being ignored. To address client heterogeneity, a weighted aggregation strategy is adopted, which adapts to data size and characteristics to better capture inter-client variations. The classification results on three public datasets (ISIC, Ocular Disease and RSNA-ICH) show that the proposed framework outperforms DenseNet121, ResNet50, ViT-S/16, ViT-L/32, FedCLIP, Swin Transformer, CoAtNet, and MixNet in most cases, with accuracy improvements ranging from 0.98\% to 41.69\%. Ablation studies on the imbalanced ISIC dataset validate the effectiveness of the proposed loss function and aggregation strategy compared to traditional loss functions and other FL approaches. The codes can be found at: https://github.com/AIPMLab/ViT-FLDAF.

翻译：尽管视觉Transformer（ViT）等深度学习模型已取得显著进展，但其通常需要大规模数据集。在数据隐私法规的限制下，许多原始数据集的访问受到制约，医学图像数据尤为突出。联邦学习（FL）通过在不交换数据的情况下实现全局模型聚合，有效应对了这一挑战。然而，本地客户端中存在的数据异构性和类别不平衡问题对模型的泛化能力构成了挑战。本研究提出了一种联邦学习框架，该框架利用动态自适应焦点损失（DAFL）和客户端感知的聚合策略进行本地训练。具体而言，我们设计了一个动态类别不平衡系数，该系数根据每个客户端的样本分布和类别数据分布进行调整，确保少数类别获得足够关注，并防止稀疏数据被忽略。为应对客户端异构性，采用了一种加权聚合策略，该策略能适应数据规模和特征，以更好地捕捉客户端间的差异。在三个公共数据集（ISIC、Ocular Disease和RSNA-ICH）上的分类结果表明，所提出的框架在大多数情况下优于DenseNet121、ResNet50、ViT-S/16、ViT-L/32、FedCLIP、Swin Transformer、CoAtNet和MixNet，准确率提升范围从0.98%到41.69%。在不平衡ISIC数据集上的消融实验验证了所提出的损失函数和聚合策略相较于传统损失函数及其他联邦学习方法具有有效性。相关代码可在以下网址获取：https://github.com/AIPMLab/ViT-FLDAF。