One-shot federated learning (OSFL) reduces the communication cost and privacy risks of iterative federated learning by constructing a global model with a single round of communication. However, most existing methods struggle to achieve robust performance on real-world domains such as medical imaging, or are inefficient when handling non-IID (Independent and Identically Distributed) data. To address these limitations, we introduce FALCON, a framework that enhances the effectiveness of OSFL over non-IID image data. The core idea of FALCON is to leverage the feature-aware hierarchical token sequences generation and knowledge distillation into OSFL. First, each client leverages a pretrained visual encoder with hierarchical scale encoding to compress images into hierarchical token sequences, which capture multi-scale semantics. Second, a multi-scale autoregressive transformer generator is used to model the distribution of these token sequences and generate the synthetic sequences. Third, clients upload the synthetic sequences along with the local classifier trained on the real token sequences to the server. Finally, the server incorporates knowledge distillation into global training to reduce reliance on precise distribution modeling. Experiments on medical and natural image datasets validate the effectiveness of FALCON in diverse non-IID scenarios, outperforming the best OSFL baselines by 9.58% in average accuracy.
翻译:单次联邦学习(OSFL)通过单轮通信构建全局模型,降低了迭代联邦学习的通信成本与隐私风险。然而,现有方法大多难以在医学影像等现实领域实现鲁棒性能,或在处理非独立同分布(non-IID)数据时效率低下。为应对这些局限,本文提出FALCON框架,旨在提升OSFL在非独立同分布图像数据上的有效性。FALCON的核心思想是将特征感知的分层标记序列生成与知识蒸馏融入OSFL流程。首先,各客户端利用具备分层尺度编码的预训练视觉编码器将图像压缩为分层标记序列,以捕获多尺度语义信息。其次,采用多尺度自回归Transformer生成器对这些标记序列的分布进行建模并生成合成序列。随后,客户端将合成序列与基于真实标记序列训练的本地分类器一同上传至服务器。最后,服务器在全局训练中引入知识蒸馏,以降低对精确分布建模的依赖。在医学与自然图像数据集上的实验验证了FALCON在多种非独立同分布场景中的有效性,其平均准确率较最佳OSFL基线方法提升9.58%。