Federated learning (FL) is an emerging paradigm in machine learning, where a shared model is collaboratively learned using data from multiple devices to mitigate the risk of data leakage. While recent studies posit that Vision Transformer (ViT) outperforms Convolutional Neural Networks (CNNs) in addressing data heterogeneity in FL, the specific architectural components that underpin this advantage have yet to be elucidated. In this paper, we systematically investigate the impact of different architectural elements, such as activation functions and normalization layers, on the performance within heterogeneous FL. Through rigorous empirical analyses, we are able to offer the first-of-its-kind general guidance on micro-architecture design principles for heterogeneous FL. Intriguingly, our findings indicate that with strategic architectural modifications, pure CNNs can achieve a level of robustness that either matches or even exceeds that of ViTs when handling heterogeneous data clients in FL. Additionally, our approach is compatible with existing FL techniques and delivers state-of-the-art solutions across a broad spectrum of FL benchmarks. The code is publicly available at https://github.com/UCSC-VLAA/FedConv
翻译:联邦学习(FL)是机器学习中一种新兴范式,通过利用多个设备上的数据协作学习共享模型,以降低数据泄露风险。尽管近期研究表明,Vision Transformer(ViT)在应对FL中的数据异构性方面优于卷积神经网络(CNN),但支撑这一优势的具体架构组件尚未阐明。本文系统性地研究了不同架构元素(如激活函数和归一化层)在异构FL中的性能影响。通过严格的实证分析,我们首次提出了针对异构FL微架构设计的通用指导原则。令人关注的是,我们的发现表明,通过策略性架构调整,纯CNN在处理FL中异构数据客户端时,能够达到与ViT相当甚至超越的鲁棒性。此外,我们的方法兼容现有FL技术,并在广泛的FL基准测试中提供了最先进的解决方案。代码已公开于https://github.com/UCSC-VLAA/FedConv。