We propose a novel hierarchical Bayesian approach to Federated Learning (FL), where our model reasonably describes the generative process of clients' local data via hierarchical Bayesian modeling: constituting random variables of local models for clients that are governed by a higher-level global variate. Interestingly, the variational inference in our Bayesian model leads to an optimisation problem whose block-coordinate descent solution becomes a distributed algorithm that is separable over clients and allows them not to reveal their own private data at all, thus fully compatible with FL. We also highlight that our block-coordinate algorithm has particular forms that subsume the well-known FL algorithms including Fed-Avg and Fed-Prox as special cases. Beyond introducing novel modeling and derivations, we also offer convergence analysis showing that our block-coordinate FL algorithm converges to an (local) optimum of the objective at the rate of $O(1/\sqrt{t})$, the same rate as regular (centralised) SGD, as well as the generalisation error analysis where we prove that the test error of our model on unseen data is guaranteed to vanish as we increase the training data size, thus asymptotically optimal.
翻译:我们提出了一种新颖的层次贝叶斯方法来处理联邦学习(FL)问题。在该方法中,我们通过层次贝叶斯建模合理描述了客户端局部数据的生成过程:构成客户端局部模型的随机变量由一个更高层次的全局变量所支配。有趣的是,我们贝叶斯模型中的变分推断导出了一个优化问题,其块坐标下降解成为一个客户端间可分离的分布式算法,并且允许客户端完全不泄露其私有数据,因此与FL完全兼容。我们还强调,该块坐标算法具有特定形式,将著名的FL算法(包括Fed-Avg和Fed-Prox)作为特例。除了引入新颖的建模和推导外,我们还提供了收敛性分析,表明该块坐标FL算法以$O(1/\sqrt{t})$的速率收敛到目标的(局部)最优值,该速率与常规(集中式)SGD相同;同时提供了泛化误差分析,证明随着训练数据量的增加,我们的模型在未见数据上的测试误差将趋于消失,因此渐近最优。