In the last years, Federated learning (FL) has become a popular solution to train machine learning models in domains with high privacy concerns. However, FL scalability and performance face significant challenges in real-world deployments where data across devices are non-independently and identically distributed (non-IID). The heterogeneity in data distribution frequently arises from spatial distribution of devices, leading to degraded model performance in the absence of proper handling. Additionally, FL typical reliance on centralized architectures introduces bottlenecks and single-point-of-failure risks, particularly problematic at scale or in dynamic environments. To close this gap, we propose Field-Based Federated Learning (FBFL), a novel approach leveraging macroprogramming and field coordination to address these limitations through: (i) distributed spatial-based leader election for personalization to mitigate non-IID data challenges; and (ii) construction of a self-organizing, hierarchical architecture using advanced macroprogramming patterns. Moreover, FBFL not only overcomes the aforementioned limitations, but also enables the development of more specialized models tailored to the specific data distribution in each subregion. This paper formalizes FBFL and evaluates it extensively using MNIST, FashionMNIST, and Extended MNIST datasets. We demonstrate that, when operating under IID data conditions, FBFL performs comparably to the widely-used FedAvg algorithm. Furthermore, in challenging non-IID scenarios, FBFL not only outperforms FedAvg but also surpasses other state-of-the-art methods, namely FedProx and Scaffold, which have been specifically designed to address non-IID data distributions. Additionally, we showcase the resilience of FBFL's self-organizing hierarchical architecture against server failures.
翻译:近年来,联邦学习(FL)已成为在隐私敏感领域训练机器学习模型的流行解决方案。然而,在实际部署中,当设备间数据呈非独立同分布(non-IID)时,联邦学习的可扩展性和性能面临重大挑战。数据分布的异质性通常源于设备的空间分布,若未妥善处理,将导致模型性能下降。此外,联邦学习对集中式架构的典型依赖引入了瓶颈和单点故障风险,在大规模或动态环境中尤为突出。为弥补这一不足,我们提出基于场的联邦学习(FBFL),这是一种利用宏编程和场协调的新方法,通过以下方式解决上述局限:(i)采用基于空间的分布式领导者选举实现个性化,以缓解非独立同分布数据带来的挑战;(ii)利用先进的宏编程模式构建自组织的层次化架构。此外,FBFL不仅克服了前述局限,还能针对各子区域的特定数据分布开发更专业化的模型。本文对FBFL进行了形式化定义,并使用MNIST、FashionMNIST和Extended MNIST数据集进行了广泛评估。实验表明,在独立同分布数据条件下,FBFL的性能与广泛使用的FedAvg算法相当。更重要的是,在具有挑战性的非独立同分布场景中,FBFL不仅优于FedAvg,也超越了其他专门针对非独立同分布数据设计的先进方法,即FedProx和Scaffold。此外,我们展示了FBFL自组织层次化架构在服务器故障情况下的鲁棒性。