Federated Learning with Uncertainty via Distilled Predictive Distributions

Most existing federated learning methods are unable to estimate model/predictive uncertainty since the client models are trained using the standard loss function minimization approach which ignores such uncertainties. In many situations, however, especially in limited data settings, it is beneficial to take into account the uncertainty in the model parameters at each client as it leads to more accurate predictions and also because reliable estimates of uncertainty can be used for tasks, such as out-of-distribution (OOD) detection, and sequential decision-making tasks, such as active learning. We present a framework for federated learning with uncertainty where, in each round, each client infers the posterior distribution over its parameters as well as the posterior predictive distribution (PPD), distills the PPD into a single deep neural network, and sends this network to the server. Unlike some of the recent Bayesian approaches to federated learning, our approach does not require sending the whole posterior distribution of the parameters from each client to the server but only the PPD in the distilled form as a deep neural network. In addition, when making predictions at test time, it does not require computationally expensive Monte-Carlo averaging over the posterior distribution because our approach always maintains the PPD in the form of a single deep neural network. Moreover, our approach does not make any restrictive assumptions, such as the form of the clients' posterior distributions, or of their PPDs. We evaluate our approach on classification in federated setting, as well as active learning and OOD detection in federated settings, on which our approach outperforms various existing federated learning baselines.

翻译：现有联邦学习方法大多无法估计模型/预测不确定性，因为客户端模型采用忽略此类不确定性的标准损失函数最小化方法进行训练。然而在许多场景下（尤其是数据受限情形中），考虑各客户端模型参数的不确定性具有重要意义：一方面可提升预测精度，另一方面可靠的不确定性估计可用于分布外检测、主动学习等序贯决策任务。本文提出一种面向不确定性的联邦学习框架，在每轮训练中，各客户端推断自身参数的后验分布及后验预测分布，将后验预测分布蒸馏为单个深度神经网络后发送至服务器。与近期一些贝叶斯联邦学习方法不同，本方法无需将各客户端参数的完整后验分布传输至服务器，仅需传输以深度神经网络形态蒸馏后的后验预测分布。同时，由于本方法始终以单个深度神经网络形式维护后验预测分布，测试时无需对后验分布进行运算量庞大的蒙特卡洛平均。此外，本方法未对客户端后验分布或其后验预测分布的形态作出任何约束假设。我们在联邦场景的分类任务、主动学习及分布外检测任务上评估了所提方法，其性能优于多种现有联邦学习基线方法。