In this paper, it is shown, for the first time, that centralized performance is achievable in decentralized learning without sharing the local datasets. Specifically, when clients adopt an empirical risk minimization with relative-entropy regularization (ERM-RER) learning framework and a forward-backward communication between clients is established, it suffices to share the locally obtained Gibbs measures to achieve the same performance as that of a centralized ERM-RER with access to all the datasets. The core idea is that the Gibbs measure produced by client~$k$ is used, as reference measure, by client~$k+1$. This effectively establishes a principled way to encode prior information through a reference measure. In particular, achieving centralized performance in the decentralized setting requires a specific scaling of the regularization factors with the local sample sizes. Overall, this result opens the door to novel decentralized learning paradigms that shift the collaboration strategy from sharing data to sharing the local inductive bias via the reference measures over the set of models.
翻译:本文首次证明,在不共享本地数据集的情况下,去中心化学习能够实现与集中式方法相当的性能。具体而言,当客户端采用基于相对熵正则化的经验风险最小化(ERM-RER)学习框架,并建立客户端间的双向通信机制时,仅需共享本地获得的吉布斯测度即可达到与访问全部数据集的集中式ERM-RER相同的性能。其核心思想在于:客户端$k$生成的吉布斯测度被客户端$k+1$用作参考测度,这为通过参考测度编码先验信息建立了原则性方法。值得注意的是,在去中心化场景下实现集中式性能,需要正则化因子随本地样本量进行特定缩放。总体而言,这一结果开辟了新型去中心化学习范式——将协作策略从数据共享转向通过模型集合上的参考测度共享局部归纳偏置。