Vanishing Variance Problem in Fully Decentralized Neural-Network Systems

Federated learning and gossip learning are emerging methodologies designed to mitigate data privacy concerns by retaining training data on client devices and exclusively sharing locally-trained machine learning (ML) models with others. The primary distinction between the two lies in their approach to model aggregation: federated learning employs a centralized parameter server, whereas gossip learning adopts a fully decentralized mechanism, enabling direct model exchanges among nodes. This decentralized nature often positions gossip learning as less efficient compared to federated learning. Both methodologies involve a critical step: computing a representation of received ML models and integrating this representation into the existing model. Conventionally, this representation is derived by averaging the received models, exemplified by the FedAVG algorithm. Our findings suggest that this averaging approach inherently introduces a potential delay in model convergence. We identify the underlying cause and refer to it as the "vanishing variance" problem, where averaging across uncorrelated ML models undermines the optimal variance established by the Xavier weight initialization. Unlike federated learning where the central server ensures model correlation, and unlike traditional gossip learning which circumvents this problem through model partitioning and sampling, our research introduces a variance-corrected model averaging algorithm. This novel algorithm preserves the optimal variance needed during model averaging, irrespective of network topology or non-IID data distributions. Our extensive simulation results demonstrate that our approach enables gossip learning to achieve convergence efficiency comparable to that of federated learning.

翻译：联邦学习与流言学习是两种新兴方法，旨在通过将训练数据保留在客户端设备上、仅共享本地训练的机器学习模型来缓解数据隐私问题。两者的主要区别在于模型聚合方式：联邦学习采用中心化的参数服务器，而流言学习采用完全去中心化的机制，允许节点间直接交换模型。这种去中心化特性通常使流言学习效率低于联邦学习。两种方法都包含一个关键步骤：计算接收到的机器学习模型的表征，并将该表征整合到现有模型中。传统上，这种表征通过平均接收到的模型获得，例如FedAVG算法。我们的研究表明，这种平均方法本质上会引入模型收敛的潜在延迟。我们揭示了其根本原因，并将其称为“方差消失”问题——即对不相关的机器学习模型进行平均会破坏Xavier权重初始化所建立的最优方差。与依赖中心服务器确保模型相关性的联邦学习不同，也不同于通过模型分割和采样规避该问题的传统流言学习，本研究提出了一种方差校正模型平均算法。这种新算法能在模型平均过程中保持所需的最优方差，且不受网络拓扑结构或非独立同分布数据的影响。大量仿真结果表明，我们的方法能使流言学习达到与联邦学习相当的收敛效率。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/