In this paper we consider online distributed learning problems. Online distributed learning refers to the process of training learning models on distributed data sources. In our setting a set of agents need to cooperatively train a learning model from streaming data. Differently from federated learning, the proposed approach does not rely on a central server but only on peer-to-peer communications among the agents. This approach is often used in scenarios where data cannot be moved to a centralized location due to privacy, security, or cost reasons. In order to overcome the absence of a central server, we propose a distributed algorithm that relies on a quantized, finite-time coordination protocol to aggregate the locally trained models. Furthermore, our algorithm allows for the use of stochastic gradients during local training. Stochastic gradients are computed using a randomly sampled subset of the local training data, which makes the proposed algorithm more efficient and scalable than traditional gradient descent. In our paper, we analyze the performance of the proposed algorithm in terms of the mean distance from the online solution. Finally, we present numerical results for a logistic regression task.
翻译:本文研究在线分布式学习问题。在线分布式学习是指基于分布式数据源训练学习模型的过程。在本文设定中,一组智能体需要协同处理流式数据以训练学习模型。与联邦学习不同,所提方法不依赖中央服务器,仅依赖智能体间的对等通信。该方法常用于因隐私、安全或成本原因无法将数据集中到中心化场景中。为克服缺少中央服务器的问题,我们提出一种分布式算法,该算法利用量化、有限时间协调协议来聚合本地训练的模型。此外,我们的算法允许在本地训练过程中使用随机梯度。随机梯度通过随机采样部分本地训练数据计算得到,这使得所提算法比传统梯度下降更高效、更具可扩展性。本文从均值在线解距离的角度分析了所提算法的性能。最后,我们展示了逻辑回归任务的数值实验结果。