Federated learning is a promising framework to train neural networks with widely distributed data. However, performance degrades heavily with heterogeneously distributed data. Recent work has shown this is due to the final layer of the network being most prone to local bias, some finding success freezing the final layer as an orthogonal classifier. We investigate the training dynamics of the classifier by applying SVD to the weights motivated by the observation that freezing weights results in constant singular values. We find that there are differences when training in IID and non-IID settings. Based on this finding, we introduce two regularization terms for local training to continuously emulate IID settings: (1) variance in the dimension-wise probability distribution of the classifier and (2) hyperspherical uniformity of representations of the encoder. These regularizations promote local models to act as if it were in an IID setting regardless of the local data distribution, thus offsetting proneness to bias while being flexible to the data. On extensive experiments in both label-shift and feature-shift settings, we verify that our method achieves highest performance by a large margin especially in highly non-IID cases in addition to being scalable to larger models and datasets.
翻译:联邦学习是一种利用广泛分布数据训练神经网络的富有前景的框架。然而,当数据呈现异构分布时,其性能会严重下降。近期研究表明,这是由于网络的最后一层最易受到局部偏差的影响,一些研究通过冻结最后一层作为正交分类器取得了成功。受冻结权重会导致奇异值保持恒定的观察启发,我们通过应用奇异值分解(SVD)来研究分类器的训练动态。我们发现,在独立同分布(IID)和非独立同分布(non-IID)设置下训练时存在差异。基于此发现,我们为局部训练引入两项正则化项,以持续模拟IID设置:(1)分类器维度概率分布的方差,以及(2)编码器表示的超球面均匀性。这些正则化项促使局部模型无论本地数据分布如何,都能如同在IID设置下运行,从而在灵活适应数据的同时抵消偏差倾向。通过在标签偏移和特征偏移两种设置下的大量实验,我们验证了所提方法在性能上大幅领先,尤其在高度非IID情况下表现优异,同时能够扩展到更大的模型与数据集。