In Federated Learning (FL), devices that participate in the training usually have heterogeneous resources, i.e., energy availability. In current deployments of FL, devices that do not fulfill certain hardware requirements are often dropped from the collaborative training. However, dropping devices in FL can degrade training accuracy and introduce bias or unfairness. Several works have tacked this problem on an algorithmic level, e.g., by letting constrained devices train a subset of the server neural network (NN) model. However, it has been observed that these techniques are not effective w.r.t. accuracy. Importantly, they make simplistic assumptions about devices' resources via indirect metrics such as multiply accumulate (MAC) operations or peak memory requirements. In this work, for the first time, we consider on-device accelerator design for FL with heterogeneous devices. We utilize compressed arithmetic formats and approximate computing, targeting to satisfy limited energy budgets. Using a hardware-aware energy model, we observe that, contrary to the state of the art's moderate energy reduction, our technique allows for lowering the energy requirements (by 4x) while maintaining higher accuracy.
翻译:在联邦学习(FL)中,参与训练的设备通常具有异构资源(如能量可用性)。当前FL部署中,未满足特定硬件要求的设备常被排除在协作训练之外。然而,在FL中丢弃设备会降低训练精度并引入偏差或不公平。已有研究在算法层面解决该问题,例如通过让受限设备训练服务器神经网络(NN)模型子集。然而,这些技术在精度方面效果有限,且它们通过间接指标(如乘累加操作或峰值内存需求)对设备资源做出了简单化假设。本文首次针对异构设备的FL考虑片上加速器设计。我们利用压缩算术格式与近似计算,旨在满足有限的能量预算。基于硬件感知能量模型,我们观察到,与现有技术中温和的能耗降低不同,我们的方法能在保持更高精度的同时将能量需求降低4倍。