Handling Data Heterogeneity via Architectural Design for Federated Visual Recognition

Federated Learning (FL) is a promising research paradigm that enables the collaborative training of machine learning models among various parties without the need for sensitive information exchange. Nonetheless, retaining data in individual clients introduces fundamental challenges to achieving performance on par with centrally trained models. Our study provides an extensive review of federated learning applied to visual recognition. It underscores the critical role of thoughtful architectural design choices in achieving optimal performance, a factor often neglected in the FL literature. Many existing FL solutions are tested on shallow or simple networks, which may not accurately reflect real-world applications. This practice restricts the transferability of research findings to large-scale visual recognition models. Through an in-depth analysis of diverse cutting-edge architectures such as convolutional neural networks, transformers, and MLP-mixers, we experimentally demonstrate that architectural choices can substantially enhance FL systems' performance, particularly when handling heterogeneous data. We study 19 visual recognition models from five different architectural families on four challenging FL datasets. We also re-investigate the inferior performance of convolution-based architectures in the FL setting and analyze the influence of normalization layers on the FL performance. Our findings emphasize the importance of architectural design for computer vision tasks in practical scenarios, effectively narrowing the performance gap between federated and centralized learning. Our source code is available at https://github.com/sarapieri/fed_het.git.

翻译：联邦学习是一种前景广阔的研究范式，它能够在无需交换敏感信息的情况下，支持多方协作训练机器学习模型。然而，数据保留在个体客户端这一特性，为实现与集中训练模型相媲美的性能带来了根本性挑战。本研究对应用于视觉识别的联邦学习进行了全面综述，强调了在设计架构时做出深思熟虑的选择对于实现最优性能的关键作用——这一因素在联邦学习文献中常被忽视。现有许多联邦学习解决方案在浅层或简单网络上测试，这可能无法准确反映真实应用场景，从而限制了研究成果向大规模视觉识别模型的可迁移性。通过对卷积神经网络、Transformer和MLP-mixer等多样化前沿架构的深入分析，我们通过实验证明，架构选择能够显著提升联邦学习系统的性能，尤其是在处理异质性数据时。我们研究了来自五个不同架构家族的19个视觉识别模型，在四个具有挑战性的联邦学习数据集上进行了测试。我们还重新探究了基于卷积的架构在联邦学习设置中表现不佳的原因，并分析了归一化层对联邦学习性能的影响。我们的研究结果强调了在实际场景中，架构设计对于计算机视觉任务的重要性，有效缩小了联邦学习与集中学习之间的性能差距。我们的源代码可在https://github.com/sarapieri/fed_het.git获取。