In Federated Learning (FL), devices that participate in the training usually have heterogeneous resources, i.e., energy availability. In current deployments of FL, devices that do not fulfill certain hardware requirements are often dropped from the collaborative training. However, dropping devices in FL can degrade training accuracy and introduce bias or unfairness. Several works have tackled this problem on an algorithm level, e.g., by letting constrained devices train a subset of the server neural network (NN) model. However, it has been observed that these techniques are not effective w.r.t. accuracy. Importantly, they make simplistic assumptions about devices' resources via indirect metrics such as multiply accumulate (MAC) operations or peak memory requirements. We observe that memory access costs (that are currently not considered in simplistic metrics) have a significant impact on the energy consumption. In this work, for the first time, we consider on-device accelerator design for FL with heterogeneous devices. We utilize compressed arithmetic formats and approximate computing, targeting to satisfy limited energy budgets. Using a hardware-aware energy model, we observe that, contrary to the state of the art's moderate energy reduction, our technique allows for lowering the energy requirements (by 4x) while maintaining higher accuracy.
翻译:在联邦学习(FL)中,参与训练的终端设备通常具有异构的资源条件,例如能量可用性。在当前的FL部署中,不满足特定硬件要求的设备通常会被排除在协同训练之外。然而,在FL中丢弃设备可能会降低训练精度,并引入偏差或不公平性。已有若干研究从算法层面解决了这一问题,例如让资源受限的设备训练服务器神经网络(NN)模型的一个子集。然而,据观察,这些技术在精度方面效果有限。更重要的是,它们通过诸如乘积累加(MAC)操作或峰值内存需求等间接指标,对设备资源做出了过于简化的假设。我们注意到,内存访问成本(在当前简化的指标中未被考虑)对能量消耗有显著影响。在本工作中,我们首次针对异构设备的FL考虑了设备端加速器设计。我们利用压缩算术格式和近似计算,旨在满足有限的能量预算。通过使用硬件感知的能量模型,我们观察到,与现有技术中能量降低幅度有限的情况相反,我们的技术能够在保持更高精度的同时,显著降低能量需求(降低4倍)。