Nowadays, billions of phones, IoT and edge devices around the world generate data continuously, enabling many Machine Learning (ML)-based products and applications. However, due to increasing privacy concerns and regulations, these data tend to reside on devices (clients) instead of being centralized for performing traditional ML model training. Federated Learning (FL) is a distributed approach in which a single server and multiple clients collaboratively build an ML model without moving data away from clients. Whereas existing studies on FL have their own experimental evaluations, most experiments were conducted using a simulation setting or a small-scale testbed. This might limit the understanding of FL implementation in realistic environments. In this empirical study, we systematically conduct extensive experiments on a large network of IoT and edge devices (called IoT-Edge devices) to present FL real-world characteristics, including learning performance and operation (computation and communication) costs. Moreover, we mainly concentrate on heterogeneous scenarios, which is the most challenging issue of FL. By investigating the feasibility of on-device implementation, our study provides valuable insights for researchers and practitioners, promoting the practicality of FL and assisting in improving the current design of real FL systems.
翻译:如今,全球数十亿部手机、物联网设备和边缘设备持续生成数据,推动了众多基于机器学习的产品与应用。然而,随着隐私问题日益突出和相关法规的完善,这些数据趋向于保留在设备(客户端)上,而非集中存储以进行传统机器学习模型训练。联邦学习(FL)是一种分布式方法,通过单个服务器与多个客户端协作构建机器学习模型,无需将数据从客户端迁移。尽管现有联邦学习研究均进行了实验评估,但大多数实验是在模拟环境或小规模测试平台上进行的。这可能限制对联邦学习在真实环境中实现的理解。在本实证研究中,我们在大规模的物联网与边缘设备(统称为物联网-边缘设备)网络上系统性地开展了广泛实验,以呈现联邦学习的真实世界特性,包括学习性能以及计算与通信的操作成本。此外,我们重点聚焦于异构场景——这是联邦学习最具挑战性的问题。通过探究设备端实现的可行性,本研究为研究人员和从业者提供了宝贵见解,推动了联邦学习的实用性,并有助于改进当前真实联邦学习系统的设计。