Federated Learning (FL) is a practical approach to train deep learning models collaboratively across user-end devices, protecting user privacy by retaining raw data on-device. In FL, participating user-end devices are highly fragmented in terms of hardware and software configurations. Such fragmentation introduces a new type of data heterogeneity in FL, namely \textit{system-induced data heterogeneity}, as each device generates distinct data depending on its hardware and software configurations. In this paper, we first characterize the impact of system-induced data heterogeneity on FL model performance. We collect a dataset using heterogeneous devices with variations across vendors and performance tiers. By using this dataset, we demonstrate that \textit{system-induced data heterogeneity} negatively impacts accuracy, and deteriorates fairness and domain generalization problems in FL. To address these challenges, we propose HeteroSwitch, which adaptively adopts generalization techniques (i.e., ISP transformation and SWAD) depending on the level of bias caused by varying HW and SW configurations. In our evaluation with a realistic FL dataset (FLAIR), HeteroSwitch reduces the variance of averaged precision by 6.3\% across device types.
翻译:联邦学习(FL)是一种跨用户设备协同训练深度学习模型的实用方法,通过将原始数据保留在设备端来保护用户隐私。在联邦学习中,参与的用户设备在硬件和软件配置上高度碎片化。这种碎片化引入了联邦学习中一种新型数据异质性,即\textit{系统诱发数据异质性}——每台设备因其硬件和软件配置不同而产生差异化的数据。本文首先刻画了系统诱发数据异质性对联邦学习模型性能的影响。我们利用横跨不同厂商和性能层级的异构设备构建了一个数据集,并通过该数据集证明:\textit{系统诱发数据异质性}会负面影响模型精度,并加剧联邦学习中的公平性与领域泛化问题。针对上述挑战,我们提出HeteroSwitch框架,该框架可根据不同硬件和软件配置引发的偏差程度,自适应地采用泛化技术(即ISP变换与SWAD)。在基于真实联邦学习数据集(FLAIR)的评估中,HeteroSwitch将各设备类型平均精度的方差降低了6.3%。