Stochastic optimization methods such as mirror descent have wide applications due to low computational cost. Those methods have been well studied under assumption of the independent and identical distribution, and usually achieve sublinear rate of convergence. However, this assumption may be too strong and unpractical in real application scenarios. Recent researches investigate stochastic gradient descent when instances are sampled from a Markov chain. Unfortunately, few results are known for stochastic mirror descent. In the paper, we propose a new version of stochastic mirror descent termed by MarchOn in the scenario of the federated learning. Given a distributed network, the model iteratively travels from a node to one of its neighbours randomly. Furthermore, we propose a new framework to analyze MarchOn, which yields best rates of convergence for convex, strongly convex, and non-convex loss. Finally, we conduct empirical studies to evaluate the convergence of MarchOn, and validate theoretical results.
翻译:随机优化方法(如镜像下降)因计算成本低而具有广泛应用。在独立同分布假设下,这些方法已得到充分研究,通常能达到次线性收敛速率。然而,这一假设在实际应用场景中可能过于严格且不切实际。近期研究探讨了当样本从马尔可夫链中采样时的随机梯度下降方法,但关于随机镜像下降的研究结果尚不充分。本文提出一种名为MarchOn的新型随机镜像下降算法,适用于联邦学习场景。在分布式网络中,模型通过随机游走迭代地从某个节点迁移至其邻居节点。此外,我们提出了一套分析MarchOn的新框架,该框架在凸损失、强凸损失和非凸损失下均能获得最优收敛速率。最后,我们通过实证研究评估了MarchOn的收敛性,并验证了理论结果。