Distributed Stein Variational Gradient Descent (DSVGD) is a non-parametric distributed learning framework for federated Bayesian learning, where multiple clients jointly train a machine learning model by communicating a number of non-random and interacting particles with the server. Since communication resources are limited, selecting the clients with most informative local learning updates can improve the model convergence and communication efficiency. In this paper, we propose two selection schemes for DSVGD based on Kernelized Stein Discrepancy (KSD) and Hilbert Inner Product (HIP). We derive the upper bound on the decrease of the global free energy per iteration for both schemes, which is then minimized to speed up the model convergence. We evaluate and compare our schemes with conventional schemes in terms of model accuracy, convergence speed, and stability using various learning tasks and datasets.
翻译:分布式斯坦变分梯度下降(DSVGD)是一种用于联邦贝叶斯学习的非参数分布式学习框架,其中多个客户端通过与服务器通信若干非随机且相互作用的粒子来联合训练机器学习模型。由于通信资源有限,选择具有最具信息量本地学习更新的客户端可以提升模型收敛速度和通信效率。本文基于核化斯坦差异(KSD)和希尔伯特内积(HIP)提出两种针对DSVGD的选择方案。我们推导了两种方案每次迭代中全局自由能下降的上界,并通过最小化该上界加速模型收敛。我们利用多种学习任务和数据集,从模型准确率、收敛速度和稳定性方面评估并比较了所提方案与传统方案的效果。